Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

2016-06-20 Thread Rohit Sardesai

Also in the tests , the produce operation starts first and after 400 seconds, 
the consume requests start in parallel. The number of parallel requests is 
driven by the number of partitions. From the logs, we see that the very first 
consumer that gets created to serve a read request takes around 29124 Ms to 
return from the poll with no records. Many subsequent reads by other consumer 
instances also report similar behaviour. Also, we tried printing the partitions 
assigned to every consumer instance and observed that no partitions were 
assigned to it whenever it reported 0 records after poll returned in 30 seconds.

From: Rohit Sardesai <rohit.sarde...@outlook.com>
Sent: 20 June 2016 11:41:50
To: users@kafka.apache.org
Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api


The consumer instances close I.e leave the group only if they are idle for a 
long time..we have expiration threads which monitor this and remove any 
consumer instances if they keep sitting . Also , consumers are closed when the 
application is shut down. The poll() does receive around 481 records the second 
time, but we process only 10 messages at a time. So the processing time is not 
very large .

From: Ewen Cheslack-Postava <e...@confluent.io>
Sent: 20 June 2016 10:52:29
To: users@kafka.apache.org
Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

Rohit,

The 30s number sounds very suspicious because it is exactly the value of
the session timeout. But if you are driving the consumer correctly, you
shouldn't normally hit this timeout. Dana was asking about consumers
leaving gracefully because that is one case where you can inadvertently
trigger the 30s timeout, require *all* group members to wait that long
before they decide one of the previous members has ungracefully left the
group and they move on without it.

It sounds like something you are doing is causing the group to wait for the
session timeout. Is it possible any of your processes are exiting without
calling consumer.close()? Or that any of your processes are not calling
consumer.poll() within the session timeout of 30s? This can sometimes
happen if they receive too much data and take too long to process it (0.10
introduced max.poll.records to help users control this, and we're making
further refinements to the consumer to provide better application control
over number of messages fetched vs total processing time).

-Ewen

On Sun, Jun 19, 2016 at 10:01 PM, Rohit Sardesai <rohit.sarde...@outlook.com
> wrote:

>
> Can anybody help out on this?
> 
> From: Rohit Sardesai
> Sent: 19 June 2016 11:47:01
> To: users@kafka.apache.org
> Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer
> api
>
>
> In my tests , I am using around 24 consumer groups.  I never call
> consumer.close() or consumer.unsubscribe() until the application is
> shutting down.
>
> So the consumers never leave but new consumer instances do get created as
> the parallel requests pile up . Also, I am reusing consumer instances
>
> if they are idle ( i,.e not serving any consume request). So with 9
> partitions , I do 9 parallel consume requests in parallel every second
> under the same consumer group.
>
> So to summarize I have the following test setup : 3 Kafka brokers , 2
> zookeeper nodes,  1 topic , 9 partitions , 24 consumer groups and 9 consume
> requests at a time.
>
>
> 
> From: Dana Powers <dana.pow...@gmail.com>
> Sent: 19 June 2016 10:45
> To: users@kafka.apache.org
> Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer
> api
>
> Is your test reusing a group name? And if so, are your consumer instances
> gracefully leaving? This may cause subsequent 'rebalance' operations to
> block until those old consumers check-in or the session timeout happens
> (30secs)
>
> -Dana
> On Jun 18, 2016 8:56 PM, "Rohit Sardesai" <rohit.sarde...@outlook.com>
> wrote:
>
> > I am using the group management feature of Kafka 0.9 to handle partition
> > assignment to consumer instances. I use the subscribe() API to subscribe
> to
> > the topic I am interested in reading data from.  I have an environment
> > where I have 3 Kafka brokers  with a couple of Zookeeper nodes . I
> created
> > a topic with 9 partitions . The performance tests attempt to send 9
> > parallel poll() requests to the Kafka brokers every second. The results
> > show that each poll() operation takes around 30 seconds for the first
> time
> > it polls and returns 0 records. Also , when I print the partition
> > assignment to this consumer instance , I see no partitions assigned to
> it.
> > The next po

Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

2016-06-20 Thread Rohit Sardesai

The consumer instances close I.e leave the group only if they are idle for a 
long time..we have expiration threads which monitor this and remove any 
consumer instances if they keep sitting . Also , consumers are closed when the 
application is shut down. The poll() does receive around 481 records the second 
time, but we process only 10 messages at a time. So the processing time is not 
very large .

From: Ewen Cheslack-Postava <e...@confluent.io>
Sent: 20 June 2016 10:52:29
To: users@kafka.apache.org
Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

Rohit,

The 30s number sounds very suspicious because it is exactly the value of
the session timeout. But if you are driving the consumer correctly, you
shouldn't normally hit this timeout. Dana was asking about consumers
leaving gracefully because that is one case where you can inadvertently
trigger the 30s timeout, require *all* group members to wait that long
before they decide one of the previous members has ungracefully left the
group and they move on without it.

It sounds like something you are doing is causing the group to wait for the
session timeout. Is it possible any of your processes are exiting without
calling consumer.close()? Or that any of your processes are not calling
consumer.poll() within the session timeout of 30s? This can sometimes
happen if they receive too much data and take too long to process it (0.10
introduced max.poll.records to help users control this, and we're making
further refinements to the consumer to provide better application control
over number of messages fetched vs total processing time).

-Ewen

On Sun, Jun 19, 2016 at 10:01 PM, Rohit Sardesai <rohit.sarde...@outlook.com
> wrote:

>
> Can anybody help out on this?
> 
> From: Rohit Sardesai
> Sent: 19 June 2016 11:47:01
> To: users@kafka.apache.org
> Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer
> api
>
>
> In my tests , I am using around 24 consumer groups.  I never call
> consumer.close() or consumer.unsubscribe() until the application is
> shutting down.
>
> So the consumers never leave but new consumer instances do get created as
> the parallel requests pile up . Also, I am reusing consumer instances
>
> if they are idle ( i,.e not serving any consume request). So with 9
> partitions , I do 9 parallel consume requests in parallel every second
> under the same consumer group.
>
> So to summarize I have the following test setup : 3 Kafka brokers , 2
> zookeeper nodes,  1 topic , 9 partitions , 24 consumer groups and 9 consume
> requests at a time.
>
>
> 
> From: Dana Powers <dana.pow...@gmail.com>
> Sent: 19 June 2016 10:45
> To: users@kafka.apache.org
> Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer
> api
>
> Is your test reusing a group name? And if so, are your consumer instances
> gracefully leaving? This may cause subsequent 'rebalance' operations to
> block until those old consumers check-in or the session timeout happens
> (30secs)
>
> -Dana
> On Jun 18, 2016 8:56 PM, "Rohit Sardesai" <rohit.sarde...@outlook.com>
> wrote:
>
> > I am using the group management feature of Kafka 0.9 to handle partition
> > assignment to consumer instances. I use the subscribe() API to subscribe
> to
> > the topic I am interested in reading data from.  I have an environment
> > where I have 3 Kafka brokers  with a couple of Zookeeper nodes . I
> created
> > a topic with 9 partitions . The performance tests attempt to send 9
> > parallel poll() requests to the Kafka brokers every second. The results
> > show that each poll() operation takes around 30 seconds for the first
> time
> > it polls and returns 0 records. Also , when I print the partition
> > assignment to this consumer instance , I see no partitions assigned to
> it.
> > The next poll() does return quickly ( ~ 10-20 ms) with data and some
> > partitions assigned to it.
> >
> > With each consumer taking 30 seconds , the performance tests report very
> > low throughput since I run the tests for around 1000 seconds out which I
> > produce messages on the topic for the complete duration and I start the
> > parallel consume requests after 400 seconds. So out of 400 seconds ,
> with 9
> > consumers taking 30 seconds each , around 270 seconds are spent in the
> > first poll without any data. Is this because of the re-balance operation
> > that the consumers are blocked on the poll() ? What is the best way to
> use
> > poll()  if I have to serve many parallel requests per second ?  Should I
> > prefer manual assignment of partitions in this case instead of relying on
> > re-balance ?
> >
> >
> > Regards,
> >
> > Rohit Sardesai
> >
> >
>



--
Thanks,
Ewen


Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

2016-06-19 Thread Ewen Cheslack-Postava
Rohit,

The 30s number sounds very suspicious because it is exactly the value of
the session timeout. But if you are driving the consumer correctly, you
shouldn't normally hit this timeout. Dana was asking about consumers
leaving gracefully because that is one case where you can inadvertently
trigger the 30s timeout, require *all* group members to wait that long
before they decide one of the previous members has ungracefully left the
group and they move on without it.

It sounds like something you are doing is causing the group to wait for the
session timeout. Is it possible any of your processes are exiting without
calling consumer.close()? Or that any of your processes are not calling
consumer.poll() within the session timeout of 30s? This can sometimes
happen if they receive too much data and take too long to process it (0.10
introduced max.poll.records to help users control this, and we're making
further refinements to the consumer to provide better application control
over number of messages fetched vs total processing time).

-Ewen

On Sun, Jun 19, 2016 at 10:01 PM, Rohit Sardesai <rohit.sarde...@outlook.com
> wrote:

>
> Can anybody help out on this?
> 
> From: Rohit Sardesai
> Sent: 19 June 2016 11:47:01
> To: users@kafka.apache.org
> Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer
> api
>
>
> In my tests , I am using around 24 consumer groups.  I never call
> consumer.close() or consumer.unsubscribe() until the application is
> shutting down.
>
> So the consumers never leave but new consumer instances do get created as
> the parallel requests pile up . Also, I am reusing consumer instances
>
> if they are idle ( i,.e not serving any consume request). So with 9
> partitions , I do 9 parallel consume requests in parallel every second
> under the same consumer group.
>
> So to summarize I have the following test setup : 3 Kafka brokers , 2
> zookeeper nodes,  1 topic , 9 partitions , 24 consumer groups and 9 consume
> requests at a time.
>
>
> 
> From: Dana Powers <dana.pow...@gmail.com>
> Sent: 19 June 2016 10:45
> To: users@kafka.apache.org
> Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer
> api
>
> Is your test reusing a group name? And if so, are your consumer instances
> gracefully leaving? This may cause subsequent 'rebalance' operations to
> block until those old consumers check-in or the session timeout happens
> (30secs)
>
> -Dana
> On Jun 18, 2016 8:56 PM, "Rohit Sardesai" <rohit.sarde...@outlook.com>
> wrote:
>
> > I am using the group management feature of Kafka 0.9 to handle partition
> > assignment to consumer instances. I use the subscribe() API to subscribe
> to
> > the topic I am interested in reading data from.  I have an environment
> > where I have 3 Kafka brokers  with a couple of Zookeeper nodes . I
> created
> > a topic with 9 partitions . The performance tests attempt to send 9
> > parallel poll() requests to the Kafka brokers every second. The results
> > show that each poll() operation takes around 30 seconds for the first
> time
> > it polls and returns 0 records. Also , when I print the partition
> > assignment to this consumer instance , I see no partitions assigned to
> it.
> > The next poll() does return quickly ( ~ 10-20 ms) with data and some
> > partitions assigned to it.
> >
> > With each consumer taking 30 seconds , the performance tests report very
> > low throughput since I run the tests for around 1000 seconds out which I
> > produce messages on the topic for the complete duration and I start the
> > parallel consume requests after 400 seconds. So out of 400 seconds ,
> with 9
> > consumers taking 30 seconds each , around 270 seconds are spent in the
> > first poll without any data. Is this because of the re-balance operation
> > that the consumers are blocked on the poll() ? What is the best way to
> use
> > poll()  if I have to serve many parallel requests per second ?  Should I
> > prefer manual assignment of partitions in this case instead of relying on
> > re-balance ?
> >
> >
> > Regards,
> >
> > Rohit Sardesai
> >
> >
>



-- 
Thanks,
Ewen


Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

2016-06-19 Thread Rohit Sardesai

Can anybody help out on this?

From: Rohit Sardesai
Sent: 19 June 2016 11:47:01
To: users@kafka.apache.org
Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api


In my tests , I am using around 24 consumer groups.  I never call 
consumer.close() or consumer.unsubscribe() until the application is shutting 
down.

So the consumers never leave but new consumer instances do get created as the 
parallel requests pile up . Also, I am reusing consumer instances

if they are idle ( i,.e not serving any consume request). So with 9 partitions 
, I do 9 parallel consume requests in parallel every second under the same 
consumer group.

So to summarize I have the following test setup : 3 Kafka brokers , 2 zookeeper 
nodes,  1 topic , 9 partitions , 24 consumer groups and 9 consume requests at a 
time.



From: Dana Powers <dana.pow...@gmail.com>
Sent: 19 June 2016 10:45
To: users@kafka.apache.org
Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

Is your test reusing a group name? And if so, are your consumer instances
gracefully leaving? This may cause subsequent 'rebalance' operations to
block until those old consumers check-in or the session timeout happens
(30secs)

-Dana
On Jun 18, 2016 8:56 PM, "Rohit Sardesai" <rohit.sarde...@outlook.com>
wrote:

> I am using the group management feature of Kafka 0.9 to handle partition
> assignment to consumer instances. I use the subscribe() API to subscribe to
> the topic I am interested in reading data from.  I have an environment
> where I have 3 Kafka brokers  with a couple of Zookeeper nodes . I created
> a topic with 9 partitions . The performance tests attempt to send 9
> parallel poll() requests to the Kafka brokers every second. The results
> show that each poll() operation takes around 30 seconds for the first time
> it polls and returns 0 records. Also , when I print the partition
> assignment to this consumer instance , I see no partitions assigned to it.
> The next poll() does return quickly ( ~ 10-20 ms) with data and some
> partitions assigned to it.
>
> With each consumer taking 30 seconds , the performance tests report very
> low throughput since I run the tests for around 1000 seconds out which I
> produce messages on the topic for the complete duration and I start the
> parallel consume requests after 400 seconds. So out of 400 seconds , with 9
> consumers taking 30 seconds each , around 270 seconds are spent in the
> first poll without any data. Is this because of the re-balance operation
> that the consumers are blocked on the poll() ? What is the best way to use
> poll()  if I have to serve many parallel requests per second ?  Should I
> prefer manual assignment of partitions in this case instead of relying on
> re-balance ?
>
>
> Regards,
>
> Rohit Sardesai
>
>


Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

2016-06-19 Thread Rohit Sardesai
In my tests , I am using around 24 consumer groups.  I never call 
consumer.close() or consumer.unsubscribe() until the application is shutting 
down.

So the consumers never leave but new consumer instances do get created as the 
parallel requests pile up . Also, I am reusing consumer instances

if they are idle ( i,.e not serving any consume request). So with 9 partitions 
, I do 9 parallel consume requests in parallel every second under the same 
consumer group.

So to summarize I have the following test setup : 3 Kafka brokers , 2 zookeeper 
nodes,  1 topic , 9 partitions , 24 consumer groups and 9 consume requests at a 
time.



From: Dana Powers <dana.pow...@gmail.com>
Sent: 19 June 2016 10:45
To: users@kafka.apache.org
Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

Is your test reusing a group name? And if so, are your consumer instances
gracefully leaving? This may cause subsequent 'rebalance' operations to
block until those old consumers check-in or the session timeout happens
(30secs)

-Dana
On Jun 18, 2016 8:56 PM, "Rohit Sardesai" <rohit.sarde...@outlook.com>
wrote:

> I am using the group management feature of Kafka 0.9 to handle partition
> assignment to consumer instances. I use the subscribe() API to subscribe to
> the topic I am interested in reading data from.  I have an environment
> where I have 3 Kafka brokers  with a couple of Zookeeper nodes . I created
> a topic with 9 partitions . The performance tests attempt to send 9
> parallel poll() requests to the Kafka brokers every second. The results
> show that each poll() operation takes around 30 seconds for the first time
> it polls and returns 0 records. Also , when I print the partition
> assignment to this consumer instance , I see no partitions assigned to it.
> The next poll() does return quickly ( ~ 10-20 ms) with data and some
> partitions assigned to it.
>
> With each consumer taking 30 seconds , the performance tests report very
> low throughput since I run the tests for around 1000 seconds out which I
> produce messages on the topic for the complete duration and I start the
> parallel consume requests after 400 seconds. So out of 400 seconds , with 9
> consumers taking 30 seconds each , around 270 seconds are spent in the
> first poll without any data. Is this because of the re-balance operation
> that the consumers are blocked on the poll() ? What is the best way to use
> poll()  if I have to serve many parallel requests per second ?  Should I
> prefer manual assignment of partitions in this case instead of relying on
> re-balance ?
>
>
> Regards,
>
> Rohit Sardesai
>
>


Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

2016-06-18 Thread Dana Powers
Is your test reusing a group name? And if so, are your consumer instances
gracefully leaving? This may cause subsequent 'rebalance' operations to
block until those old consumers check-in or the session timeout happens
(30secs)

-Dana
On Jun 18, 2016 8:56 PM, "Rohit Sardesai" 
wrote:

> I am using the group management feature of Kafka 0.9 to handle partition
> assignment to consumer instances. I use the subscribe() API to subscribe to
> the topic I am interested in reading data from.  I have an environment
> where I have 3 Kafka brokers  with a couple of Zookeeper nodes . I created
> a topic with 9 partitions . The performance tests attempt to send 9
> parallel poll() requests to the Kafka brokers every second. The results
> show that each poll() operation takes around 30 seconds for the first time
> it polls and returns 0 records. Also , when I print the partition
> assignment to this consumer instance , I see no partitions assigned to it.
> The next poll() does return quickly ( ~ 10-20 ms) with data and some
> partitions assigned to it.
>
> With each consumer taking 30 seconds , the performance tests report very
> low throughput since I run the tests for around 1000 seconds out which I
> produce messages on the topic for the complete duration and I start the
> parallel consume requests after 400 seconds. So out of 400 seconds , with 9
> consumers taking 30 seconds each , around 270 seconds are spent in the
> first poll without any data. Is this because of the re-balance operation
> that the consumers are blocked on the poll() ? What is the best way to use
> poll()  if I have to serve many parallel requests per second ?  Should I
> prefer manual assignment of partitions in this case instead of relying on
> re-balance ?
>
>
> Regards,
>
> Rohit Sardesai
>
>


consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

2016-06-18 Thread Rohit Sardesai
I am using the group management feature of Kafka 0.9 to handle partition 
assignment to consumer instances. I use the subscribe() API to subscribe to the 
topic I am interested in reading data from.  I have an environment where I have 
3 Kafka brokers  with a couple of Zookeeper nodes . I created a topic with 9 
partitions . The performance tests attempt to send 9 parallel poll() requests 
to the Kafka brokers every second. The results show that each poll() operation 
takes around 30 seconds for the first time it polls and returns 0 records. Also 
, when I print the partition assignment to this consumer instance , I see no 
partitions assigned to it.  The next poll() does return quickly ( ~ 10-20 ms) 
with data and some partitions assigned to it.

With each consumer taking 30 seconds , the performance tests report very low 
throughput since I run the tests for around 1000 seconds out which I produce 
messages on the topic for the complete duration and I start the parallel 
consume requests after 400 seconds. So out of 400 seconds , with 9 consumers 
taking 30 seconds each , around 270 seconds are spent in the first poll without 
any data. Is this because of the re-balance operation that the consumers are 
blocked on the poll() ? What is the best way to use poll()  if I have to serve 
many parallel requests per second ?  Should I prefer manual assignment of 
partitions in this case instead of relying on re-balance ?


Regards,

Rohit Sardesai



Re: Consumer group ACL limited to new consumer API?

2016-05-19 Thread David Hawes
I'd be happy to do that, but in this case it looks like the next
release has it covered:

https://www.elastic.co/blog/logstash-5-0-0-alpha1-released

(See the Kafka 0.9 section)

On 19 May 2016 at 10:50, Tom Crayford <tcrayf...@heroku.com> wrote:
> You could always contribute back to logstash - I'm sure they'd appreciate
> it.
>
> On Thu, May 19, 2016 at 3:47 PM, David Hawes <dha...@vt.edu> wrote:
>
>> Thanks for the confirmation.
>>
>> I like the idea about only allowing authenticated customers
>> (definitely what I want). Unfortunately, I'm running Kafka with an ELK
>> installation and was hoping for some kind of stopgap while the
>> logstash input plugins catch up and support TLS. When the logstash
>> kafka plugin supports TLS, this sounds like a viable option.
>>
>> On 19 May 2016 at 08:55, Tom Crayford <tcrayf...@heroku.com> wrote:
>> > Hi there,
>> >
>> > One way to disable the old consumer is to only allow authenticated
>> > consumers (via SSL or another authentication system) - the old consumers
>> > don't support authentication at all. If you care about ACLs anyway, you
>> > probably don't want unauthenticated consumers or producers in the system
>> at
>> > all.
>> >
>> > The ACL for sure only works on the new consumer API, because the old one
>> > talks directly to zookeeper so there's no good way to apply the same ACLs
>> > there.
>> >
>> > Thanks
>> >
>> > Tom Crayford
>> > Heroku Kafka
>> >
>> > On Thu, May 19, 2016 at 1:28 AM, David Hawes <dha...@vt.edu> wrote:
>> >
>> >> I have been playing around with ACLs and was hoping to limit access to
>> >> a topic and consumer group by IP, but was unable to get it working.
>> >> Basically, I was able to Read from a topic as a consumer group that
>> >> was not allowed.
>> >>
>> >> KIP-11 has the following line about consumer groups:
>> >>
>> >> In order to consume from a topic using the new consumer API, the
>> >> principal will need: READ on TOPIC and READ on CONSUMER-GROUP.
>> >>
>> >> This tipped me off that the ACL may only work with the new consumer
>> >> API, which I was not using. Sure enough, using the new consumer API
>> >> denied my access by consumer group until I added an appropriate ACL.
>> >>
>> >> Is there some way to disable the old consumer API in Kafka? I see the
>> >> inter.broker.protocol.version directive, but nothing about clients.
>> >> Will there ever be support for group ACLs with the old consumer API?
>> >>
>> >> Without some way to disable the old consumer from being used, the
>> >> consumer group ACLs are effectively useless as of version 0.9.0.1.
>> >>
>>


Re: Consumer group ACL limited to new consumer API?

2016-05-19 Thread David Hawes
Thanks for the confirmation.

I like the idea about only allowing authenticated customers
(definitely what I want). Unfortunately, I'm running Kafka with an ELK
installation and was hoping for some kind of stopgap while the
logstash input plugins catch up and support TLS. When the logstash
kafka plugin supports TLS, this sounds like a viable option.

On 19 May 2016 at 08:55, Tom Crayford <tcrayf...@heroku.com> wrote:
> Hi there,
>
> One way to disable the old consumer is to only allow authenticated
> consumers (via SSL or another authentication system) - the old consumers
> don't support authentication at all. If you care about ACLs anyway, you
> probably don't want unauthenticated consumers or producers in the system at
> all.
>
> The ACL for sure only works on the new consumer API, because the old one
> talks directly to zookeeper so there's no good way to apply the same ACLs
> there.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>
> On Thu, May 19, 2016 at 1:28 AM, David Hawes <dha...@vt.edu> wrote:
>
>> I have been playing around with ACLs and was hoping to limit access to
>> a topic and consumer group by IP, but was unable to get it working.
>> Basically, I was able to Read from a topic as a consumer group that
>> was not allowed.
>>
>> KIP-11 has the following line about consumer groups:
>>
>> In order to consume from a topic using the new consumer API, the
>> principal will need: READ on TOPIC and READ on CONSUMER-GROUP.
>>
>> This tipped me off that the ACL may only work with the new consumer
>> API, which I was not using. Sure enough, using the new consumer API
>> denied my access by consumer group until I added an appropriate ACL.
>>
>> Is there some way to disable the old consumer API in Kafka? I see the
>> inter.broker.protocol.version directive, but nothing about clients.
>> Will there ever be support for group ACLs with the old consumer API?
>>
>> Without some way to disable the old consumer from being used, the
>> consumer group ACLs are effectively useless as of version 0.9.0.1.
>>


Re: Consumer group ACL limited to new consumer API?

2016-05-19 Thread Tom Crayford
Hi there,

One way to disable the old consumer is to only allow authenticated
consumers (via SSL or another authentication system) - the old consumers
don't support authentication at all. If you care about ACLs anyway, you
probably don't want unauthenticated consumers or producers in the system at
all.

The ACL for sure only works on the new consumer API, because the old one
talks directly to zookeeper so there's no good way to apply the same ACLs
there.

Thanks

Tom Crayford
Heroku Kafka

On Thu, May 19, 2016 at 1:28 AM, David Hawes <dha...@vt.edu> wrote:

> I have been playing around with ACLs and was hoping to limit access to
> a topic and consumer group by IP, but was unable to get it working.
> Basically, I was able to Read from a topic as a consumer group that
> was not allowed.
>
> KIP-11 has the following line about consumer groups:
>
> In order to consume from a topic using the new consumer API, the
> principal will need: READ on TOPIC and READ on CONSUMER-GROUP.
>
> This tipped me off that the ACL may only work with the new consumer
> API, which I was not using. Sure enough, using the new consumer API
> denied my access by consumer group until I added an appropriate ACL.
>
> Is there some way to disable the old consumer API in Kafka? I see the
> inter.broker.protocol.version directive, but nothing about clients.
> Will there ever be support for group ACLs with the old consumer API?
>
> Without some way to disable the old consumer from being used, the
> consumer group ACLs are effectively useless as of version 0.9.0.1.
>


Consumer group ACL limited to new consumer API?

2016-05-18 Thread David Hawes
I have been playing around with ACLs and was hoping to limit access to
a topic and consumer group by IP, but was unable to get it working.
Basically, I was able to Read from a topic as a consumer group that
was not allowed.

KIP-11 has the following line about consumer groups:

In order to consume from a topic using the new consumer API, the
principal will need: READ on TOPIC and READ on CONSUMER-GROUP.

This tipped me off that the ACL may only work with the new consumer
API, which I was not using. Sure enough, using the new consumer API
denied my access by consumer group until I added an appropriate ACL.

Is there some way to disable the old consumer API in Kafka? I see the
inter.broker.protocol.version directive, but nothing about clients.
Will there ever be support for group ACLs with the old consumer API?

Without some way to disable the old consumer from being used, the
consumer group ACLs are effectively useless as of version 0.9.0.1.


RE: New consumer API waits indefinitely

2016-04-13 Thread Lohith Samaga M
Dear All,
After a system restart, the new consumer is working as expected.

Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga




-Original Message-
From: Lohith Samaga M [mailto:lohith.sam...@mphasis.com] 
Sent: Tuesday, April 12, 2016 17.00
To: users@kafka.apache.org
Subject: RE: New consumer API waits indefinitely

Dear All,
I installed Kafka on a Linux VM.
Here too:
1. The producer is able to store messages in the topic (sent from 
Windows host).
2. The consumer is unable to read it either from Windows host or from 
kafka-console-consumer on the Linux VM console.

In the logs, I see:
[2016-04-12 16:51:00,672] INFO [GroupCoordinator 0]: Stabilized group 
console-consumer-39913 generation 1 (kafka.coordinator.GroupCoordinator)
[2016-04-12 16:51:00,676] INFO [GroupCoordinator 0]: Assignment received from 
leader for group console-consumer-39913 for generation 1 
(kafka.coordinator.GroupCoordinator)
[2016-04-12 16:51:09,638] INFO [GroupCoordinator 0]: Preparing to restabilize 
group console-consumer-39913 with old generation 1 
(kafka.coordinator.GroupCoordinator)
[2016-04-12 16:51:09,640] INFO [GroupCoordinator 0]: Group 
console-consumer-39913 generation 1 is dead and removed 
(kafka.coordinator.GroupCoordinator)
[2016-04-12 16:53:08,489] INFO [Group Metadata Manager on Broker 0]: Removed 0 
expired offsets in 1 milliseconds. (kafka.coordinator.GroupMetadataManager)

When I run my Java code, I still get the exception - 
org.apache.kafka.clients.consumer.internals.SendFailedException


So, is it advisable to use the old consumer on Kafka 0.9.0.1?

Please help.

Thanks in advance.


Best regards / Mit freundlichen Grüßen / Sincères salutations M. Lohith Samaga



-Original Message-
From: Lohith Samaga M [mailto:lohith.sam...@mphasis.com]
Sent: Tuesday, April 05, 2016 13.36
To: users@kafka.apache.org
Subject: RE: New consumer API waits indefinitely

Hi Ismael, Niko,
After cleaning up the zookeeper and kafka logs, I do not get the below 
server exception anymore. I think Kafka did not like me opening the .log file 
in notepad.

The only exception that I now get is 
org.apache.kafka.clients.consumer.internals.SendFailedException in 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.RequestFutureCompletionHandler.
After that, the consumer goes into a loop.

Best regards / Mit freundlichen Grüßen / Sincères salutations M. Lohith Samaga



-Original Message-
From: Lohith Samaga M [mailto:lohith.sam...@mphasis.com]
Sent: Tuesday, April 05, 2016 12.38
To: users@kafka.apache.org
Subject: RE: New consumer API waits indefinitely

Hi Ismael,
I see the following exception when I (re)start Kafka (even a fresh 
setup after the previous one). And where is the configuration to set the data 
directory for Kafka (not the logs)?

java.io.IOException: The requested operation cannot be performed on a file with 
a user-mapped section open
at java.io.RandomAccessFile.setLength(Native Method)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:285)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetI
ndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.sc
ala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.sc
ala:265)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
at kafka.log.LogSegment.recover(LogSegment.scala:199)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
TraversableLike.scala:778)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimize
d.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.s
cala:777)
at kafka.log.Log.loadSegments(Log.scala:160)
at kafka.log.Log.(Log.scala:90)
at kafka.log.LogManager.createLog(LogManager.scala:357)
at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:91)
at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.s
cala:173)
at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.s
cala:173)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:173)
at kafka.cluster.Partition$$anonfun$4.apply

RE: New consumer API waits indefinitely

2016-04-12 Thread Lohith Samaga M
Dear All,
I installed Kafka on a Linux VM.
Here too:
1. The producer is able to store messages in the topic (sent from 
Windows host).
2. The consumer is unable to read it either from Windows host or from 
kafka-console-consumer on the Linux VM console.

In the logs, I see:
[2016-04-12 16:51:00,672] INFO [GroupCoordinator 0]: Stabilized group 
console-consumer-39913 generation 1 (kafka.coordinator.GroupCoordinator)
[2016-04-12 16:51:00,676] INFO [GroupCoordinator 0]: Assignment received from 
leader for group console-consumer-39913 for generation 1 
(kafka.coordinator.GroupCoordinator)
[2016-04-12 16:51:09,638] INFO [GroupCoordinator 0]: Preparing to restabilize 
group console-consumer-39913 with old generation 1 
(kafka.coordinator.GroupCoordinator)
[2016-04-12 16:51:09,640] INFO [GroupCoordinator 0]: Group 
console-consumer-39913 generation 1 is dead and removed 
(kafka.coordinator.GroupCoordinator)
[2016-04-12 16:53:08,489] INFO [Group Metadata Manager on Broker 0]: Removed 0 
expired offsets in 1 milliseconds. (kafka.coordinator.GroupMetadataManager)

When I run my Java code, I still get the exception - 
org.apache.kafka.clients.consumer.internals.SendFailedException


So, is it advisable to use the old consumer on Kafka 0.9.0.1?

Please help.

Thanks in advance.


Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga



-Original Message-
From: Lohith Samaga M [mailto:lohith.sam...@mphasis.com] 
Sent: Tuesday, April 05, 2016 13.36
To: users@kafka.apache.org
Subject: RE: New consumer API waits indefinitely

Hi Ismael, Niko,
After cleaning up the zookeeper and kafka logs, I do not get the below 
server exception anymore. I think Kafka did not like me opening the .log file 
in notepad.

The only exception that I now get is 
org.apache.kafka.clients.consumer.internals.SendFailedException in 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.RequestFutureCompletionHandler.
After that, the consumer goes into a loop.

Best regards / Mit freundlichen Grüßen / Sincères salutations M. Lohith Samaga



-Original Message-
From: Lohith Samaga M [mailto:lohith.sam...@mphasis.com]
Sent: Tuesday, April 05, 2016 12.38
To: users@kafka.apache.org
Subject: RE: New consumer API waits indefinitely

Hi Ismael,
I see the following exception when I (re)start Kafka (even a fresh 
setup after the previous one). And where is the configuration to set the data 
directory for Kafka (not the logs)?

java.io.IOException: The requested operation cannot be performed on a file with 
a user-mapped section open
at java.io.RandomAccessFile.setLength(Native Method)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:285)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetI
ndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.sc
ala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.sc
ala:265)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
at kafka.log.LogSegment.recover(LogSegment.scala:199)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
TraversableLike.scala:778)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimize
d.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.s
cala:777)
at kafka.log.Log.loadSegments(Log.scala:160)
at kafka.log.Log.(Log.scala:90)
at kafka.log.LogManager.createLog(LogManager.scala:357)
at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:91)
at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.s
cala:173)
at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.s
cala:173)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:173)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:165)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:270)
at kafka.cluster.Partition.makeLeader(Partition.scala:165)
at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManag
er.scala:692)
at kafka.server.ReplicaManager$$anonfun$makeLeaders

RE: New consumer API waits indefinitely

2016-04-05 Thread Lohith Samaga M
Hi Ismael, Niko,
After cleaning up the zookeeper and kafka logs, I do not get the below 
server exception anymore. I think Kafka did not like me opening the .log file 
in notepad.

The only exception that I now get is 
org.apache.kafka.clients.consumer.internals.SendFailedException in 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.RequestFutureCompletionHandler.
After that, the consumer goes into a loop.

Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga



-Original Message-
From: Lohith Samaga M [mailto:lohith.sam...@mphasis.com] 
Sent: Tuesday, April 05, 2016 12.38
To: users@kafka.apache.org
Subject: RE: New consumer API waits indefinitely

Hi Ismael,
I see the following exception when I (re)start Kafka (even a fresh 
setup after the previous one). And where is the configuration to set the data 
directory for Kafka (not the logs)?

java.io.IOException: The requested operation cannot be performed on a file with 
a user-mapped section open
at java.io.RandomAccessFile.setLength(Native Method)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:285)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetI
ndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.sc
ala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.sc
ala:265)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
at kafka.log.LogSegment.recover(LogSegment.scala:199)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
TraversableLike.scala:778)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimize
d.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.s
cala:777)
at kafka.log.Log.loadSegments(Log.scala:160)
at kafka.log.Log.(Log.scala:90)
at kafka.log.LogManager.createLog(LogManager.scala:357)
at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:91)
at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.s
cala:173)
at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.s
cala:173)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:173)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:165)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:270)
at kafka.cluster.Partition.makeLeader(Partition.scala:165)
at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManag
er.scala:692)
at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManag
er.scala:691)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.sca
la:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.sca
la:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala
:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:691)
at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.sca
la:637)
at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:131)

at kafka.server.KafkaApis.handle(KafkaApis.scala:72)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:724)




Best regards / Mit freundlichen Grüßen / Sincères salutations M. Lohith Samaga



-Original Message-
From: isma...@gmail.com [mailto:isma...@gmail.com] On Behalf Of Ismael Juma
Sent: Monday, April 04, 2016 17.21
To: users@kafka.apache.org
Subject: Re: New consumer API waits indefinitely

Hi Lohith,

Are there any errors in your broker logs? I think there may be some issues with 
compacted topics on Windows and the new consumer uses a compacted topic to 
store offsets.

Ismael

On Mon, Apr 4, 2016 at 12:20 PM, Lohith Samaga M <lohith.sam...@mphasis.com>
wrote:

> Dear All,
> The error seems to be NOT_COORDINATOR_FOR_GROUP.
> The exception thrown in
> org.apache.kafka.clients.consumer.inter

RE: New consumer API waits indefinitely

2016-04-05 Thread Lohith Samaga M
Hi Ismael,
I see the following exception when I (re)start Kafka (even a fresh 
setup after the previous one). And where is the configuration to set the data 
directory for Kafka (not the logs)?

java.io.IOException: The requested operation cannot be performed on a file with
a user-mapped section open
at java.io.RandomAccessFile.setLength(Native Method)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:285)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetI
ndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.sc
ala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.sc
ala:265)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
at kafka.log.LogSegment.recover(LogSegment.scala:199)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
TraversableLike.scala:778)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimize
d.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.s
cala:777)
at kafka.log.Log.loadSegments(Log.scala:160)
at kafka.log.Log.(Log.scala:90)
at kafka.log.LogManager.createLog(LogManager.scala:357)
at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:91)
at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.s
cala:173)
at kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.s
cala:173)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:173)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:165)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:270)
at kafka.cluster.Partition.makeLeader(Partition.scala:165)
at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManag
er.scala:692)
at kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManag
er.scala:691)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.sca
la:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.sca
la:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala
:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:691)
at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.sca
la:637)
at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:131)

at kafka.server.KafkaApis.handle(KafkaApis.scala:72)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:724)




Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga



-Original Message-
From: isma...@gmail.com [mailto:isma...@gmail.com] On Behalf Of Ismael Juma
Sent: Monday, April 04, 2016 17.21
To: users@kafka.apache.org
Subject: Re: New consumer API waits indefinitely

Hi Lohith,

Are there any errors in your broker logs? I think there may be some issues with 
compacted topics on Windows and the new consumer uses a compacted topic to 
store offsets.

Ismael

On Mon, Apr 4, 2016 at 12:20 PM, Lohith Samaga M <lohith.sam...@mphasis.com>
wrote:

> Dear All,
> The error seems to be NOT_COORDINATOR_FOR_GROUP.
> The exception thrown in
> org.apache.kafka.clients.consumer.internals.RequestFuture is:
> org.apache.kafka.common.errors.NotCoordinatorForGroupException:
> This is not the correct coordinator for this group.
>
> However, this exception is considered RetriableException in 
> org.apache.kafka.clients.consumer.internals.RequestFuture.
> So, the retry goes on - in a loop.
>
> It also happens that the Coordinator object becomes null in 
> AbstractCoordinator class.
>
> Can somebody please help?
>
>
> Best regards / Mit freundlichen Grüßen / Sincères salutations M. 
> Lohith Samaga
>
>
>
>
> -Original Message-
> From: Ratha v [mailto:vijayara...@gmail.com]
> Sent: Monday, April 04, 2016 12.22
> 

RE: New consumer API waits indefinitely

2016-04-05 Thread Lohith Samaga M
Thanks Niko!

I think I missed an 
org.apache.kafka.clients.consumer.internals.SendFailedException exception at 
the very beginning (or atleast it is giving an exception today).

Even after using a new install of Kafka, I get the same errors. Strangely, all 
topics are re-created in the logs. I cannot find the data directory in my drive.
How can I cleanup and start again?

Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga


-Original Message-
From: Niko Davor [mailto:nikoda...@gmail.com] 
Sent: Monday, April 04, 2016 23.59
To: users@kafka.apache.org
Subject: RE: New consumer API waits indefinitely

M. Lohith Samaga,

Your Java code looks fine.

Usually, if consumer.poll(100); doesn't return, there is probably a basic 
connection error. If Kafka can't connect, it will internally go into an 
infinite loop. To me, that doesn't seem like a good design, but that's a 
separate tangent.

Turn SLF4J root logging up to debug and you will probably see the connection 
error messages.

A second thought is it might be worth trying using Kafka on a small Linux VM. 
The docs say, "Windows is not currently a well supported platform though we 
would be happy to change that.". Even if you want to use Windows as a server in 
the long run, at least as a development test option, I'd want to be able to 
test with a Linux VM.

FYI, I'm a Kafka newbie, and I've had no problems getting working code samples 
up and running with Kafka 0.9.0.1 and the new Producer/Consumer APIs. I've 
gotten code samples running in Java, Scala, and Python, and everything works, 
including cross language tests.

Lastly, as a mailing list question, how do I reply to a question like this if I 
see the original question in the web archives but it is not in my mail client? 
I suspect that this reply will show up as a different thread which is not what 
I want.
Information transmitted by this e-mail is proprietary to Mphasis, its 
associated companies and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination 
of this information in any manner is strictly 
prohibited. In such cases, please notify us immediately at 
mailmas...@mphasis.com and delete this mail from your records.


Re: New consumer API waits indefinitely

2016-04-04 Thread Ratha v
This is the same logs i get with my local kafka server, that works fine..

On 5 April 2016 at 10:20, Ratha v  wrote:

> HI Niko;
> I face this issue with linux systems..
> I changed the logging level to debug and when I start and stop my consumer
> (stopping the program)
>  I get same exception. What is the cause here?
>
> [2016-04-05 00:01:08,784] DEBUG Connection with /192.xx.xx.248
> disconnected (org.apache.kafka.common.network.Selector)
>
> kafka_1 | java.io.EOFException
>
> kafka_1 | at
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
>
> kafka_1 | at
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>
> kafka_1 | at
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
>
> kafka_1 | at
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
>
> kafka_1 | at
> org.apache.kafka.common.network.Selector.poll(Selector.java:286)
>
> kafka_1 | at kafka.network.Processor.run(SocketServer.scala:413)
>
> kafka_1 | at java.lang.Thread.run(Thread.java:745)
>
> kafka_1 | [2016-04-05 00:01:09,236] DEBUG Got ping response for
> sessionid: 0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)
>
> kafka_1 | [2016-04-05 00:01:11,236] DEBUG Got ping response for
> sessionid: 0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)
>
> kafka_1 | [2016-04-05 00:01:13,238] DEBUG Got ping response for
> sessionid: 0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)
>
> kafka_1 | [2016-04-05 00:01:14,078] DEBUG Connection with /192.168.0.248
> disconnected (org.apache.kafka.common.network.Selector)
>
> kafka_1 | java.io.EOFException
>
> kafka_1 | at
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
>
> kafka_1 | at
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>
> kafka_1 | at
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
>
> kafka_1 | at
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
>
> kafka_1 | at
> org.apache.kafka.common.network.Selector.poll(Selector.java:286)
>
> kafka_1 | at kafka.network.Processor.run(SocketServer.scala:413)
>
> kafka_1 | at java.lang.Thread.run(Thread.java:745)
>
> kafka_1 | [2016-04-05 00:01:15,240] DEBUG Got ping response for
> sessionid: 0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)
>
> kafka_1 | [2016-04-05 00:01:17,240] DEBUG Got ping response for
> sessionid: 0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)
>
> kafka_1 | [2016-04-05 00:01:19,242] DEBUG Got ping response for
> sessionid: 0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)
>
> kafka_1 | [2016-04-05 00:01:19,558] DEBUG Connection with /192.xx.xx.248
> disconnected (org.apache.kafka.common.network.Selector)
>
> kafka_1 | java.io.EOFException
>
> kafka_1 | at
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
>
> kafka_1 | at
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>
> kafka_1 | at
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
>
> kafka_1 | at
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
>
> kafka_1 | at
> org.apache.kafka.common.network.Selector.poll(Selector.java:286)
>
> kafka_1 | at kafka.network.Processor.run(SocketServer.scala:413)
>
> kafka_1 | at java.lang.Thread.run(Thread.java:745)
>
> kafka_1 | [2016-04-05 00:01:21,242] DEBUG Got ping response for
> sessionid: 0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnx
>
>
> On 5 April 2016 at 04:29, Niko Davor  wrote:
>
>> M. Lohith Samaga,
>>
>> Your Java code looks fine.
>>
>> Usually, if consumer.poll(100); doesn't return, there is probably a basic
>> connection error. If Kafka can't connect, it will internally go into an
>> infinite loop. To me, that doesn't seem like a good design, but that's a
>> separate tangent.
>>
>> Turn SLF4J root logging up to debug and you will probably see the
>> connection error messages.
>>
>> A second thought is it might be worth trying using Kafka on a small Linux
>> VM. The docs say, "Windows is not currently a well supported platform
>> though we would be happy to change that.". Even if you want to use Windows
>> as a server in the long run, at least as a development test option, I'd
>> want to be able to test with a Linux VM.
>>
>> FYI, I'm a Kafka newbie, and I've had no problems getting working code
>> samples up and running with Kafka 0.9.0.1 and the new Producer/Consumer
>> APIs. I've gotten code samples running in Java, Scala, and Python, and
>> everything works, including cross language tests.
>>
>> Lastly, as a mailing list question, how do I reply to a question like this
>> if I see the original question in the web archives but it is not in my
>> mail
>> client? I suspect that this reply will show up as a 

Re: New consumer API waits indefinitely

2016-04-04 Thread Ratha v
HI Niko;
I face this issue with linux systems..
I changed the logging level to debug and when I start and stop my consumer
(stopping the program)
 I get same exception. What is the cause here?

[2016-04-05 00:01:08,784] DEBUG Connection with /192.xx.xx.248 disconnected
(org.apache.kafka.common.network.Selector)

kafka_1 | java.io.EOFException

kafka_1 | at
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)

kafka_1 | at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)

kafka_1 | at
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)

kafka_1 | at
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)

kafka_1 | at
org.apache.kafka.common.network.Selector.poll(Selector.java:286)

kafka_1 | at kafka.network.Processor.run(SocketServer.scala:413)

kafka_1 | at java.lang.Thread.run(Thread.java:745)

kafka_1 | [2016-04-05 00:01:09,236] DEBUG Got ping response for sessionid:
0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)

kafka_1 | [2016-04-05 00:01:11,236] DEBUG Got ping response for sessionid:
0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)

kafka_1 | [2016-04-05 00:01:13,238] DEBUG Got ping response for sessionid:
0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)

kafka_1 | [2016-04-05 00:01:14,078] DEBUG Connection with /192.168.0.248
disconnected (org.apache.kafka.common.network.Selector)

kafka_1 | java.io.EOFException

kafka_1 | at
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)

kafka_1 | at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)

kafka_1 | at
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)

kafka_1 | at
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)

kafka_1 | at
org.apache.kafka.common.network.Selector.poll(Selector.java:286)

kafka_1 | at kafka.network.Processor.run(SocketServer.scala:413)

kafka_1 | at java.lang.Thread.run(Thread.java:745)

kafka_1 | [2016-04-05 00:01:15,240] DEBUG Got ping response for sessionid:
0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)

kafka_1 | [2016-04-05 00:01:17,240] DEBUG Got ping response for sessionid:
0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)

kafka_1 | [2016-04-05 00:01:19,242] DEBUG Got ping response for sessionid:
0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnxn)

kafka_1 | [2016-04-05 00:01:19,558] DEBUG Connection with /192.xx.xx.248
disconnected (org.apache.kafka.common.network.Selector)

kafka_1 | java.io.EOFException

kafka_1 | at
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)

kafka_1 | at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)

kafka_1 | at
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)

kafka_1 | at
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)

kafka_1 | at
org.apache.kafka.common.network.Selector.poll(Selector.java:286)

kafka_1 | at kafka.network.Processor.run(SocketServer.scala:413)

kafka_1 | at java.lang.Thread.run(Thread.java:745)

kafka_1 | [2016-04-05 00:01:21,242] DEBUG Got ping response for sessionid:
0x253405b88b300a4 after 0ms (org.apache.zookeeper.ClientCnx


On 5 April 2016 at 04:29, Niko Davor  wrote:

> M. Lohith Samaga,
>
> Your Java code looks fine.
>
> Usually, if consumer.poll(100); doesn't return, there is probably a basic
> connection error. If Kafka can't connect, it will internally go into an
> infinite loop. To me, that doesn't seem like a good design, but that's a
> separate tangent.
>
> Turn SLF4J root logging up to debug and you will probably see the
> connection error messages.
>
> A second thought is it might be worth trying using Kafka on a small Linux
> VM. The docs say, "Windows is not currently a well supported platform
> though we would be happy to change that.". Even if you want to use Windows
> as a server in the long run, at least as a development test option, I'd
> want to be able to test with a Linux VM.
>
> FYI, I'm a Kafka newbie, and I've had no problems getting working code
> samples up and running with Kafka 0.9.0.1 and the new Producer/Consumer
> APIs. I've gotten code samples running in Java, Scala, and Python, and
> everything works, including cross language tests.
>
> Lastly, as a mailing list question, how do I reply to a question like this
> if I see the original question in the web archives but it is not in my mail
> client? I suspect that this reply will show up as a different thread which
> is not what I want.
>



-- 
-Ratha
http://vvratha.blogspot.com/


RE: New consumer API waits indefinitely

2016-04-04 Thread Niko Davor
M. Lohith Samaga,

Your Java code looks fine.

Usually, if consumer.poll(100); doesn't return, there is probably a basic
connection error. If Kafka can't connect, it will internally go into an
infinite loop. To me, that doesn't seem like a good design, but that's a
separate tangent.

Turn SLF4J root logging up to debug and you will probably see the
connection error messages.

A second thought is it might be worth trying using Kafka on a small Linux
VM. The docs say, "Windows is not currently a well supported platform
though we would be happy to change that.". Even if you want to use Windows
as a server in the long run, at least as a development test option, I'd
want to be able to test with a Linux VM.

FYI, I'm a Kafka newbie, and I've had no problems getting working code
samples up and running with Kafka 0.9.0.1 and the new Producer/Consumer
APIs. I've gotten code samples running in Java, Scala, and Python, and
everything works, including cross language tests.

Lastly, as a mailing list question, how do I reply to a question like this
if I see the original question in the web archives but it is not in my mail
client? I suspect that this reply will show up as a different thread which
is not what I want.


Re: New consumer API waits indefinitely

2016-04-04 Thread Ismael Juma
Hi Lohith,

Are there any errors in your broker logs? I think there may be some issues
with compacted topics on Windows and the new consumer uses a compacted
topic to store offsets.

Ismael

On Mon, Apr 4, 2016 at 12:20 PM, Lohith Samaga M <lohith.sam...@mphasis.com>
wrote:

> Dear All,
> The error seems to be NOT_COORDINATOR_FOR_GROUP.
> The exception thrown in
> org.apache.kafka.clients.consumer.internals.RequestFuture is:
> org.apache.kafka.common.errors.NotCoordinatorForGroupException:
> This is not the correct coordinator for this group.
>
> However, this exception is considered RetriableException in
> org.apache.kafka.clients.consumer.internals.RequestFuture.
> So, the retry goes on - in a loop.
>
> It also happens that the Coordinator object becomes null in
> AbstractCoordinator class.
>
> Can somebody please help?
>
>
> Best regards / Mit freundlichen Grüßen / Sincères salutations
> M. Lohith Samaga
>
>
>
>
> -Original Message-
> From: Ratha v [mailto:vijayara...@gmail.com]
> Sent: Monday, April 04, 2016 12.22
> To: users@kafka.apache.org
> Subject: Re: New consumer API waits indefinitely
>
> Still struggling :)
> Check following threads;
>
>- If my producer producing, then why the consumer couldn't consume? it
>stuck @ poll()
>- Consumer thread is waiting forever, not returning any objects
>
>
> I think new APIs are recommended.
>
>
> On 4 April 2016 at 16:37, Lohith Samaga M <lohith.sam...@mphasis.com>
> wrote:
>
> > Thanks for letting me know.
> >
> > Is there any work around? A fix?
> >
> > Which set of API is recommended for production use?
> >
> > Best regards / Mit freundlichen Grüßen / Sincères salutations M.
> > Lohith Samaga
> >
> >
> >
> >
> > -Original Message-
> > From: Ratha v [mailto:vijayara...@gmail.com]
> > Sent: Monday, April 04, 2016 11.27
> > To: users@kafka.apache.org
> > Subject: Re: New consumer API waits indefinitely
> >
> > I too face same issue:(
> >
> > On 4 April 2016 at 15:51, Lohith Samaga M <lohith.sam...@mphasis.com>
> > wrote:
> >
> > > HI,
> > > Good morning.
> > >
> > > I am new to Kafka. So, please bear with me.
> > > I am using the new Producer and Consumer API with
> > > Kafka
> > > 0.9.0.1 running on Windows 7 laptop with zookeeper.
> > >
> > > I was able to send messages using the new Producer
> > > API. I can see the messages in the Kafka data directory.
> > >
> > > However, when I run the consumer, it does not
> > > retrieve the messages. It keeps waiting for the messages indefinitely.
> > > My code (taken from Javadoc and modified)  is as below:
> > >
> > > props.put("bootstrap.servers", "localhost:9092");
> > > props.put("group.id", "new01");
> > > props.put("enable.auto.commit", "true");
> > > props.put("auto.commit.interval.ms", "1000");
> > > props.put("session.timeout.ms", "3");
> > > props.put("key.deserializer",
> > > "org.apache.kafka.common.serialization.StringDeserializer");
> > > props.put("value.deserializer",
> > > "org.apache.kafka.common.serialization.StringDeserializer");
> > >
> > > KafkaConsumer<String, String> consumer = new
> > > KafkaConsumer<>(props);
> > > consumer.subscribe(Arrays.asList("new-producer"));
> > > while (true) {
> > > ConsumerRecords<String, String> records =
> > > consumer.poll(100);
> > > for (ConsumerRecord<String, String> record : records)
> > > System.out.printf("offset = %d, key = %s, value
> > > = %s", record.offset(), record.key(), record.value());
> > > }
> > >
> > > Can anybody please tell me what went wrong?
> > >
> > > Thanks & Regards,
> > > M. Lohith Samaga
> > >
> > > Information transmitted by this e-mail is proprietary to Mphasis,
> > > its associated companies and/ or its customers and is intended for
> > > use only by the individual or entity to which it is addr

RE: New consumer API waits indefinitely

2016-04-04 Thread Lohith Samaga M
Thanks Ratha.

I am trying tounderstand the code...

Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga


-Original Message-
From: Ratha v [mailto:vijayara...@gmail.com] 
Sent: Monday, April 04, 2016 12.22
To: users@kafka.apache.org
Subject: Re: New consumer API waits indefinitely

Still struggling :)
Check following threads;

   - If my producer producing, then why the consumer couldn't consume? it
   stuck @ poll()
   - Consumer thread is waiting forever, not returning any objects


I think new APIs are recommended.


On 4 April 2016 at 16:37, Lohith Samaga M <lohith.sam...@mphasis.com> wrote:

> Thanks for letting me know.
>
> Is there any work around? A fix?
>
> Which set of API is recommended for production use?
>
> Best regards / Mit freundlichen Grüßen / Sincères salutations M. 
> Lohith Samaga
>
>
>
>
> -Original Message-
> From: Ratha v [mailto:vijayara...@gmail.com]
> Sent: Monday, April 04, 2016 11.27
> To: users@kafka.apache.org
> Subject: Re: New consumer API waits indefinitely
>
> I too face same issue:(
>
> On 4 April 2016 at 15:51, Lohith Samaga M <lohith.sam...@mphasis.com>
> wrote:
>
> > HI,
> > Good morning.
> >
> > I am new to Kafka. So, please bear with me.
> > I am using the new Producer and Consumer API with 
> > Kafka
> > 0.9.0.1 running on Windows 7 laptop with zookeeper.
> >
> > I was able to send messages using the new Producer 
> > API. I can see the messages in the Kafka data directory.
> >
> > However, when I run the consumer, it does not 
> > retrieve the messages. It keeps waiting for the messages indefinitely.
> > My code (taken from Javadoc and modified)  is as below:
> >
> > props.put("bootstrap.servers", "localhost:9092");
> > props.put("group.id", "new01");
> > props.put("enable.auto.commit", "true");
> > props.put("auto.commit.interval.ms", "1000");
> > props.put("session.timeout.ms", "3");
> > props.put("key.deserializer", 
> > "org.apache.kafka.common.serialization.StringDeserializer");
> > props.put("value.deserializer", 
> > "org.apache.kafka.common.serialization.StringDeserializer");
> >
> > KafkaConsumer<String, String> consumer = new 
> > KafkaConsumer<>(props);
> > consumer.subscribe(Arrays.asList("new-producer"));
> > while (true) {
> > ConsumerRecords<String, String> records = 
> > consumer.poll(100);
> > for (ConsumerRecord<String, String> record : records)
> > System.out.printf("offset = %d, key = %s, value 
> > = %s", record.offset(), record.key(), record.value());
> > }
> >
> > Can anybody please tell me what went wrong?
> >
> > Thanks & Regards,
> > M. Lohith Samaga
> >
> > Information transmitted by this e-mail is proprietary to Mphasis, 
> > its associated companies and/ or its customers and is intended for 
> > use only by the individual or entity to which it is addressed, and 
> > may contain information that is privileged, confidential or exempt 
> > from disclosure under applicable law. If you are not the intended 
> > recipient or it appears that this mail has been forwarded to you 
> > without proper authority, you are notified that any use or 
> > dissemination of this information in any manner is strictly 
> > prohibited. In such cases, please notify us immediately at 
> > mailmas...@mphasis.com and delete this mail from your records.
> >
>
>
>
> --
> -Ratha
> http://vvratha.blogspot.com/
> Information transmitted by this e-mail is proprietary to Mphasis, its 
> associated companies and/ or its customers and is intended for use 
> only by the individual or entity to which it is addressed, and may 
> contain information that is privileged, confidential or exempt from 
> disclosure under applicable law. If you are not the intended recipient 
> or it appears that this mail has been forwarded to you without proper 
> authority, you are notified that any use or dissemination of this 
> information in any manner is strictly prohibited. In such cases, 
> please notify us immediately at mailmas...@mphasis.com and delete this 
> mail from your records.
>



--
-Ratha
http://vvratha.blogspot.c

Re: New consumer API waits indefinitely

2016-04-04 Thread Ratha v
Still struggling :)
Check following threads;

   - If my producer producing, then why the consumer couldn't consume? it
   stuck @ poll()
   - Consumer thread is waiting forever, not returning any objects


I think new APIs are recommended.


On 4 April 2016 at 16:37, Lohith Samaga M <lohith.sam...@mphasis.com> wrote:

> Thanks for letting me know.
>
> Is there any work around? A fix?
>
> Which set of API is recommended for production use?
>
> Best regards / Mit freundlichen Grüßen / Sincères salutations
> M. Lohith Samaga
>
>
>
>
> -Original Message-
> From: Ratha v [mailto:vijayara...@gmail.com]
> Sent: Monday, April 04, 2016 11.27
> To: users@kafka.apache.org
> Subject: Re: New consumer API waits indefinitely
>
> I too face same issue:(
>
> On 4 April 2016 at 15:51, Lohith Samaga M <lohith.sam...@mphasis.com>
> wrote:
>
> > HI,
> > Good morning.
> >
> > I am new to Kafka. So, please bear with me.
> > I am using the new Producer and Consumer API with
> > Kafka
> > 0.9.0.1 running on Windows 7 laptop with zookeeper.
> >
> > I was able to send messages using the new Producer
> > API. I can see the messages in the Kafka data directory.
> >
> > However, when I run the consumer, it does not retrieve
> > the messages. It keeps waiting for the messages indefinitely.
> > My code (taken from Javadoc and modified)  is as below:
> >
> > props.put("bootstrap.servers", "localhost:9092");
> > props.put("group.id", "new01");
> > props.put("enable.auto.commit", "true");
> > props.put("auto.commit.interval.ms", "1000");
> > props.put("session.timeout.ms", "3");
> > props.put("key.deserializer",
> > "org.apache.kafka.common.serialization.StringDeserializer");
> > props.put("value.deserializer",
> > "org.apache.kafka.common.serialization.StringDeserializer");
> >
> > KafkaConsumer<String, String> consumer = new
> > KafkaConsumer<>(props);
> > consumer.subscribe(Arrays.asList("new-producer"));
> > while (true) {
> > ConsumerRecords<String, String> records =
> > consumer.poll(100);
> > for (ConsumerRecord<String, String> record : records)
> > System.out.printf("offset = %d, key = %s, value =
> > %s", record.offset(), record.key(), record.value());
> > }
> >
> > Can anybody please tell me what went wrong?
> >
> > Thanks & Regards,
> > M. Lohith Samaga
> >
> > Information transmitted by this e-mail is proprietary to Mphasis, its
> > associated companies and/ or its customers and is intended for use
> > only by the individual or entity to which it is addressed, and may
> > contain information that is privileged, confidential or exempt from
> > disclosure under applicable law. If you are not the intended recipient
> > or it appears that this mail has been forwarded to you without proper
> > authority, you are notified that any use or dissemination of this
> > information in any manner is strictly prohibited. In such cases,
> > please notify us immediately at mailmas...@mphasis.com and delete this
> > mail from your records.
> >
>
>
>
> --
> -Ratha
> http://vvratha.blogspot.com/
> Information transmitted by this e-mail is proprietary to Mphasis, its
> associated companies and/ or its customers and is intended
> for use only by the individual or entity to which it is addressed, and may
> contain information that is privileged, confidential or
> exempt from disclosure under applicable law. If you are not the intended
> recipient or it appears that this mail has been forwarded
> to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly
> prohibited. In such cases, please notify us immediately at
> mailmas...@mphasis.com and delete this mail from your records.
>



-- 
-Ratha
http://vvratha.blogspot.com/


RE: New consumer API waits indefinitely

2016-04-04 Thread Lohith Samaga M
Thanks for letting me know.

Is there any work around? A fix?

Which set of API is recommended for production use?

Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga




-Original Message-
From: Ratha v [mailto:vijayara...@gmail.com] 
Sent: Monday, April 04, 2016 11.27
To: users@kafka.apache.org
Subject: Re: New consumer API waits indefinitely

I too face same issue:(

On 4 April 2016 at 15:51, Lohith Samaga M <lohith.sam...@mphasis.com> wrote:

> HI,
> Good morning.
>
> I am new to Kafka. So, please bear with me.
> I am using the new Producer and Consumer API with 
> Kafka
> 0.9.0.1 running on Windows 7 laptop with zookeeper.
>
> I was able to send messages using the new Producer 
> API. I can see the messages in the Kafka data directory.
>
> However, when I run the consumer, it does not retrieve 
> the messages. It keeps waiting for the messages indefinitely.
> My code (taken from Javadoc and modified)  is as below:
>
> props.put("bootstrap.servers", "localhost:9092");
> props.put("group.id", "new01");
> props.put("enable.auto.commit", "true");
> props.put("auto.commit.interval.ms", "1000");
> props.put("session.timeout.ms", "3");
> props.put("key.deserializer", 
> "org.apache.kafka.common.serialization.StringDeserializer");
> props.put("value.deserializer", 
> "org.apache.kafka.common.serialization.StringDeserializer");
>
> KafkaConsumer<String, String> consumer = new 
> KafkaConsumer<>(props);
> consumer.subscribe(Arrays.asList("new-producer"));
> while (true) {
> ConsumerRecords<String, String> records = 
> consumer.poll(100);
> for (ConsumerRecord<String, String> record : records)
> System.out.printf("offset = %d, key = %s, value = 
> %s", record.offset(), record.key(), record.value());
> }
>
> Can anybody please tell me what went wrong?
>
> Thanks & Regards,
> M. Lohith Samaga
>
> Information transmitted by this e-mail is proprietary to Mphasis, its 
> associated companies and/ or its customers and is intended for use 
> only by the individual or entity to which it is addressed, and may 
> contain information that is privileged, confidential or exempt from 
> disclosure under applicable law. If you are not the intended recipient 
> or it appears that this mail has been forwarded to you without proper 
> authority, you are notified that any use or dissemination of this 
> information in any manner is strictly prohibited. In such cases, 
> please notify us immediately at mailmas...@mphasis.com and delete this 
> mail from your records.
>



--
-Ratha
http://vvratha.blogspot.com/
Information transmitted by this e-mail is proprietary to Mphasis, its 
associated companies and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination 
of this information in any manner is strictly 
prohibited. In such cases, please notify us immediately at 
mailmas...@mphasis.com and delete this mail from your records.


Re: New consumer API waits indefinitely

2016-04-03 Thread Ratha v
I too face same issue:(

On 4 April 2016 at 15:51, Lohith Samaga M  wrote:

> HI,
> Good morning.
>
> I am new to Kafka. So, please bear with me.
> I am using the new Producer and Consumer API with Kafka
> 0.9.0.1 running on Windows 7 laptop with zookeeper.
>
> I was able to send messages using the new Producer API. I
> can see the messages in the Kafka data directory.
>
> However, when I run the consumer, it does not retrieve the
> messages. It keeps waiting for the messages indefinitely.
> My code (taken from Javadoc and modified)  is as below:
>
> props.put("bootstrap.servers", "localhost:9092");
> props.put("group.id", "new01");
> props.put("enable.auto.commit", "true");
> props.put("auto.commit.interval.ms", "1000");
> props.put("session.timeout.ms", "3");
> props.put("key.deserializer",
> "org.apache.kafka.common.serialization.StringDeserializer");
> props.put("value.deserializer",
> "org.apache.kafka.common.serialization.StringDeserializer");
>
> KafkaConsumer consumer = new
> KafkaConsumer<>(props);
> consumer.subscribe(Arrays.asList("new-producer"));
> while (true) {
> ConsumerRecords records =
> consumer.poll(100);
> for (ConsumerRecord record : records)
> System.out.printf("offset = %d, key = %s, value = %s",
> record.offset(), record.key(), record.value());
> }
>
> Can anybody please tell me what went wrong?
>
> Thanks & Regards,
> M. Lohith Samaga
>
> Information transmitted by this e-mail is proprietary to Mphasis, its
> associated companies and/ or its customers and is intended
> for use only by the individual or entity to which it is addressed, and may
> contain information that is privileged, confidential or
> exempt from disclosure under applicable law. If you are not the intended
> recipient or it appears that this mail has been forwarded
> to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly
> prohibited. In such cases, please notify us immediately at
> mailmas...@mphasis.com and delete this mail from your records.
>



-- 
-Ratha
http://vvratha.blogspot.com/


New consumer API waits indefinitely

2016-04-03 Thread Lohith Samaga M
HI,
Good morning.

I am new to Kafka. So, please bear with me.
I am using the new Producer and Consumer API with Kafka 0.9.0.1 
running on Windows 7 laptop with zookeeper.

I was able to send messages using the new Producer API. I can 
see the messages in the Kafka data directory.

However, when I run the consumer, it does not retrieve the 
messages. It keeps waiting for the messages indefinitely.
My code (taken from Javadoc and modified)  is as below:

props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "new01");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "3");
props.put("key.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");

KafkaConsumer consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("new-producer"));
while (true) {
ConsumerRecords records = consumer.poll(100);
for (ConsumerRecord record : records)
System.out.printf("offset = %d, key = %s, value = %s", 
record.offset(), record.key(), record.value());
}

Can anybody please tell me what went wrong?

Thanks & Regards,
M. Lohith Samaga

Information transmitted by this e-mail is proprietary to Mphasis, its 
associated companies and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination 
of this information in any manner is strictly 
prohibited. In such cases, please notify us immediately at 
mailmas...@mphasis.com and delete this mail from your records.


Re: new consumer api / heartbeat, manual commit & long to process messages

2016-02-26 Thread Jason Gustafson
Hey Guven,

A heartbeat API actually came up in the discussion of KIP-41. Ultimately we
rejected it because it led to confusing API semantics. The problem is that
heartbeat responses are used by the coordinator to tell consumers when a
rebalance is needed. But what should the user do if they call heartbeat()
and find that the group is rebalancing? If they don't stop message
processing and rejoin, then they may be kicked out of the group just as if
they had failed to heartbeat before expiration of the session timeout.
Alternatively, if we made heartbeat() blocking and let the rebalance
complete in the call itself, then the consumer may no longer be assigned
the same partitions. So either way, unless you can preempt message
processing, you may fall out of the group and pending messages will need to
be reprocessed after the rebalance completes. And if you can preempt
message processing, then you can ensure that heartbeats get sent by always
preempting the processor before the session timeout expires.

In the end, we felt that max.poll.records was a simpler option since it
gives you fine control over the poll loop and doesn't require any confusing
API changes . As long as you can put some upper bound on the processing
time, you can set max.poll.records=1 and the session timeout to whatever
the upper bound is.

However, if you have a use case where there is a very high variance in
message processing times, it may not be so helpful. In that case, the best
options I can think of at the moment are the following:

1. Move the processing to another thread. Basically the workflow would be
something like this: 1) receive records for a partition in poll(), 2)
submit them to an executor for processing, 3) pause the partition, and 4)
continue the poll loop. When the processor finishes with the records, you
can use resume() to reenable fetching. You'll have to manage offset commits
yourself since you wouldn't want to commit before the thread has actually
finished processing. You'll also have to account for the possibility of a
rebalance completing while the thread is still processing a batch (an easy
way to do this would probably be to just ignore CommitFailedException
thrown from commit).

2. This is a tad hacky, but you could take advantage of the fact that the
coordinator treats commits as heartbeats and call commitSync() periodically
while handling a batch of records. Note in this case that you should not
use the no-arg commitSync() variant which will commit the offsets for the
full batch returned from the last poll(). Instead you should pass the
offsets of the records already processed explicitly in
commitSync(Map).

3. Use the consumer in "simple" mode. If you don't actually need group
coordination, then you can assign the partitions you want to read from
manually and consume them at your own rate. There is no heartbeating or
rebalancing to worry about.

-Jason

On Fri, Feb 26, 2016 at 1:20 AM, Guven Demir 
wrote:

> thanks for the response Jason,
>
> i've already experimented with a similar solution myself, lowering
> max.partition.fetch.bytes to barely fit the largest message (2k at the
> moment)
>
> still, i've observed similar problems, which is caused by really long
> processing times, e.g. downloading a large video via a link received in the
> message
>
> it's not very feasible to increase the heartbeat timeout too much, as
> session timeout is recommened to be at least 3 times that of heartbeat
> timeout. and that is bounded by broker's group.max.session.timeout.ms,
> which i would not want to increase as it would affect all other
> topics/consumers
>
> could there be an api for triggering the heartbeat manually maybe? it can
> be argued that that would beat the purpose of a heartbeat though, it might
> be used improperly, i.e. in my case rather than sending heartbeats inside
> the download/save loop but in an empty loop waiting for the download to
> complete, which might never happen. again, sending heartbeats in
> application code might be considered tight coupling as well
>
> other than that, i will experiment with the pause() api, separate thread
> for the actual message processing and poll()'ing with all partitions paused
>
> guven
>
>
> > On 25 Feb 2016, at 20:19, Jason Gustafson  wrote:
> >
> > Hey Guven,
> >
> > This problem is what KIP-41 was created for:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-41%3A+KafkaConsumer+Max+Records
> > .
> >
> > The patch for this was committed yesterday and will be included in 0.10.
> If
> > you need something in the shorter term, you could probably use the client
> > from trunk (no changes to the server are needed).
> >
> > If this is still not sufficient, I recommend looking into the pause()
> API,
> > which can facilitate asynchronous message processing in another thread.
> >
> > -Jason
> >
> > On Thu, Feb 25, 2016 at 8:53 AM, Guven Demir 

Re: new consumer api / heartbeat, manual commit & long to process messages

2016-02-26 Thread Guven Demir
thanks for the response Jason,

i've already experimented with a similar solution myself, lowering 
max.partition.fetch.bytes to barely fit the largest message (2k at the moment)

still, i've observed similar problems, which is caused by really long 
processing times, e.g. downloading a large video via a link received in the 
message

it's not very feasible to increase the heartbeat timeout too much, as session 
timeout is recommened to be at least 3 times that of heartbeat timeout. and 
that is bounded by broker's group.max.session.timeout.ms, which i would not 
want to increase as it would affect all other topics/consumers

could there be an api for triggering the heartbeat manually maybe? it can be 
argued that that would beat the purpose of a heartbeat though, it might be used 
improperly, i.e. in my case rather than sending heartbeats inside the 
download/save loop but in an empty loop waiting for the download to complete, 
which might never happen. again, sending heartbeats in application code might 
be considered tight coupling as well

other than that, i will experiment with the pause() api, separate thread for 
the actual message processing and poll()'ing with all partitions paused

guven


> On 25 Feb 2016, at 20:19, Jason Gustafson  wrote:
> 
> Hey Guven,
> 
> This problem is what KIP-41 was created for:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-41%3A+KafkaConsumer+Max+Records
> .
> 
> The patch for this was committed yesterday and will be included in 0.10. If
> you need something in the shorter term, you could probably use the client
> from trunk (no changes to the server are needed).
> 
> If this is still not sufficient, I recommend looking into the pause() API,
> which can facilitate asynchronous message processing in another thread.
> 
> -Jason
> 
> On Thu, Feb 25, 2016 at 8:53 AM, Guven Demir 
> wrote:
> 
>> hi all,
>> 
>> i'm having trouble processing a topic which includes paths to images which
>> need to be downloaded and saved to disk (each takes ~3-5 seconds) and
>> several are received on each poll
>> 
>> within this scenario, i'm receiving the following error:
>> 
>>org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot
>> be completed due to group rebalance
>> 
>> which i assume is due to heartbeat failure and broker re-assigning the
>> consumer's partition to another consumer
>> 
>> are there any recommendations for processing long to process messages?
>> 
>> thanks in advance,
>> guven
>> 
>> 
>> 



Re: new consumer api / heartbeat, manual commit & long to process messages

2016-02-25 Thread Jason Gustafson
Hey Guven,

This problem is what KIP-41 was created for:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-41%3A+KafkaConsumer+Max+Records
.

The patch for this was committed yesterday and will be included in 0.10. If
you need something in the shorter term, you could probably use the client
from trunk (no changes to the server are needed).

If this is still not sufficient, I recommend looking into the pause() API,
which can facilitate asynchronous message processing in another thread.

-Jason

On Thu, Feb 25, 2016 at 8:53 AM, Guven Demir 
wrote:

> hi all,
>
> i'm having trouble processing a topic which includes paths to images which
> need to be downloaded and saved to disk (each takes ~3-5 seconds) and
> several are received on each poll
>
> within this scenario, i'm receiving the following error:
>
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot
> be completed due to group rebalance
>
> which i assume is due to heartbeat failure and broker re-assigning the
> consumer's partition to another consumer
>
> are there any recommendations for processing long to process messages?
>
> thanks in advance,
> guven
>
>
>


new consumer api / heartbeat, manual commit & long to process messages

2016-02-25 Thread Guven Demir
hi all,

i'm having trouble processing a topic which includes paths to images which need 
to be downloaded and saved to disk (each takes ~3-5 seconds) and several are 
received on each poll

within this scenario, i'm receiving the following error:

org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
completed due to group rebalance

which i assume is due to heartbeat failure and broker re-assigning the 
consumer's partition to another consumer

are there any recommendations for processing long to process messages?

thanks in advance,
guven




Re: Stuck consumer with new consumer API in 0.9

2016-01-26 Thread Bruno Rassaerts
We do not seek in the onPartitionAssigned.

In our test setup (evaluating kafka for a new project) we put a constant load 
on one of the topics.
We have a consumer group pulling messages from the different partitions on the 
topic.

At a certain point in time, the poll() does not return any messages anymore.
When this happens, we just seek() to the current offset, and messages come in 
again.

It is a bit annoying as we do seeks when there are no messages as well, but at 
least it prevents stalling the client.

Bruno


> On 25 Jan 2016, at 16:07, Han JU <ju.han.fe...@gmail.com> wrote:
> 
> Hi Bruno,
> 
> Can you tell me a little bit more about that? A seek() in the
> `onPartitionAssigned`?
> 
> Thanks.
> 
> 2016-01-25 10:51 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> 
>> Ok I'll create a JIRA issue on this.
>> 
>> Thanks!
>> 
>> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <bruno.rassae...@novazone.be>:
>> 
>>> +1 here
>>> 
>>> As a workaround we seek to the current offset which resets the current
>>> clients internal states and everything continues.
>>> 
>>> Regards,
>>> Bruno Rassaerts | Freelance Java Developer
>>> 
>>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
>>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
>>> bruno.rassae...@novazone.be -www.novazone.be
>>> 
>>>> On 23 Jan 2016, at 17:52, Ismael Juma <ism...@juma.me.uk> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Can you please file an issue in JIRA so that we make sure this is
>>>> investigated?
>>>> 
>>>> Ismael
>>>> 
>>>>> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju.han.fe...@gmail.com>
>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I'm prototyping with the new consumer API of kafka 0.9 and I'm
>>> particularly
>>>>> interested in the `ConsumerRebalanceListener`.
>>>>> 
>>>>> My test setup is like the following:
>>>>> - 5M messages pre-loaded in one node kafka 0.9
>>>>> - 12 partitions, auto offset commit set to false
>>>>> - in `onPartitionsRevoked`, commit offset and flush the local state
>>>>> 
>>>>> The test run is like the following:
>>>>> - launch one process with 2 consumers and let it consume for a while
>>>>> - launch another process with 2 consumers, this triggers a
>>> rebalancing,
>>>>> and let these 2 processes run until messages are all consumed
>>>>> 
>>>>> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
>>>>> 
>>>>> So at first, the 2 consumers of the first process each got 6
>>> partitions.
>>>>> And after the rebalancing, each consumer got 3 partitions. It's
>>> confirmed
>>>>> by logging inside the `onPartitionAssigned` callback.
>>>>> 
>>>>> But after the rebalancing, one of the 2 consumers of the first process
>>> stop
>>>>> receiving messages, even if it has partitions assigned to:
>>>>> 
>>>>> balance-1 pulled 7237 msgs ...
>>>>> balance-0 pulled 7263 msgs ...
>>>>> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
>>>>> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>>> the
>>>>> group is rebalancing, try to re-join group.
>>>>> balance-1 flush @ 536637
>>>>> balance-1 committed offset for List(balance-11, balance-10, balance-9,
>>>>> balance-8, balance-7, balance-6)
>>>>> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
>>>>> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>>> the
>>>>> group is rebalancing, try to re-join group.
>>>>> balance-0 flush @ 543845
>>>>> balance-0 committed offset for List(balance-5, balance-4, balance-3,
>>>>> balance-2, balance-1, balance-0)
>>>>> balance-0 got assigned List(balance-5, balance-4, balance-3)
>>>>> balance-1 got assigned List(balance-11, balance-10, balance-9)
>>>>> balance-1 pulled 3625 msgs ...
>>>>> balance-0 pulled 3621 msgs ...
>>>>> balance-0 pulled 3631 msgs ...
>>>>> balance-0 pulled 3631 msgs ...
>>>>> balance-1 pulled 0 msgs ...
>>>>> balance-0 pulled 3643 msgs ...
>>>>> balance-0 pulled 3643 msgs ...
>

Re: Stuck consumer with new consumer API in 0.9

2016-01-26 Thread Jun Rao
Rajiv,

We haven't released 0.9.0.1 yet. To try the fix, you can build a new client
jar off the 0.9.0 branch.

Thanks,

Jun

On Mon, Jan 25, 2016 at 12:03 PM, Rajiv Kurian <ra...@signalfx.com> wrote:

> Thanks Jason. We are using an affected client I guess.
>
> Is there a 0.9.0 client available on maven? My search at
> http://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 only shows
> the 0.9.0.0 client which seems to have this issue.
>
>
> Thanks,
> Rajiv
>
> On Mon, Jan 25, 2016 at 11:56 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Rajiv, the bug was on the client. Here's a link to the JIRA:
> > https://issues.apache.org/jira/browse/KAFKA-2978.
> >
> > -Jason
> >
> > On Mon, Jan 25, 2016 at 11:42 AM, Rajiv Kurian <ra...@signalfx.com>
> wrote:
> >
> > > Hi Jason,
> > >
> > > Was this a server bug or a client bug?
> > >
> > > Thanks,
> > > Rajiv
> > >
> > > On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Apologies for the late arrival to this thread. There was a bug in the
> > > > 0.9.0.0 release of Kafka which could cause the consumer to stop
> > fetching
> > > > from a partition after a rebalance. If you're seeing this, please
> > > checkout
> > > > the 0.9.0 branch of Kafka and see if you can reproduce this problem.
> If
> > > you
> > > > can, then it would be really helpful if you file a JIRA with the
> steps
> > to
> > > > reproduce.
> > > >
> > > > From Han's initial example, it kind of looks like the problem might
> be
> > in
> > > > the usage. The consumer lag as shown by the kafka-consumer-groups
> > script
> > > > relies on the last committed position to determine lag. To update
> > > progress,
> > > > you need to commit offsets regularly. In the gist, offsets are only
> > > > committed on shutdown or when a rebalance occurs. When the group is
> > > stable,
> > > > no progress will be seen because there are no commits to update the
> > > > position.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <ism...@juma.me.uk>
> > wrote:
> > > >
> > > > > Thanks!
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju.han.fe...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > > > > >
> > > > > > 2016-01-25 16:07 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > > > > >
> > > > > > > Hi Bruno,
> > > > > > >
> > > > > > > Can you tell me a little bit more about that? A seek() in the
> > > > > > > `onPartitionAssigned`?
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > > > > > >
> > > > > > >> Ok I'll create a JIRA issue on this.
> > > > > > >>
> > > > > > >> Thanks!
> > > > > > >>
> > > > > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > > > > bruno.rassae...@novazone.be
> > > > > > >:
> > > > > > >>
> > > > > > >>> +1 here
> > > > > > >>>
> > > > > > >>> As a workaround we seek to the current offset which resets
> the
> > > > > current
> > > > > > >>> clients internal states and everything continues.
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>> Bruno Rassaerts | Freelance Java Developer
> > > > > > >>>
> > > > > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > > > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > > > > >>> bruno.rassae...@novazone.be -www.novazone.be
> > > > > > >>>
> > > > > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <

Re: Stuck consumer with new consumer API in 0.9

2016-01-26 Thread Rajiv Kurian
Thanks Jun.

On Tue, Jan 26, 2016 at 3:48 PM, Jun Rao <j...@confluent.io> wrote:

> Rajiv,
>
> We haven't released 0.9.0.1 yet. To try the fix, you can build a new client
> jar off the 0.9.0 branch.
>
> Thanks,
>
> Jun
>
> On Mon, Jan 25, 2016 at 12:03 PM, Rajiv Kurian <ra...@signalfx.com> wrote:
>
> > Thanks Jason. We are using an affected client I guess.
> >
> > Is there a 0.9.0 client available on maven? My search at
> > http://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 only shows
> > the 0.9.0.0 client which seems to have this issue.
> >
> >
> > Thanks,
> > Rajiv
> >
> > On Mon, Jan 25, 2016 at 11:56 AM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hey Rajiv, the bug was on the client. Here's a link to the JIRA:
> > > https://issues.apache.org/jira/browse/KAFKA-2978.
> > >
> > > -Jason
> > >
> > > On Mon, Jan 25, 2016 at 11:42 AM, Rajiv Kurian <ra...@signalfx.com>
> > wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > Was this a server bug or a client bug?
> > > >
> > > > Thanks,
> > > > Rajiv
> > > >
> > > > On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <
> ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Apologies for the late arrival to this thread. There was a bug in
> the
> > > > > 0.9.0.0 release of Kafka which could cause the consumer to stop
> > > fetching
> > > > > from a partition after a rebalance. If you're seeing this, please
> > > > checkout
> > > > > the 0.9.0 branch of Kafka and see if you can reproduce this
> problem.
> > If
> > > > you
> > > > > can, then it would be really helpful if you file a JIRA with the
> > steps
> > > to
> > > > > reproduce.
> > > > >
> > > > > From Han's initial example, it kind of looks like the problem might
> > be
> > > in
> > > > > the usage. The consumer lag as shown by the kafka-consumer-groups
> > > script
> > > > > relies on the last committed position to determine lag. To update
> > > > progress,
> > > > > you need to commit offsets regularly. In the gist, offsets are only
> > > > > committed on shutdown or when a rebalance occurs. When the group is
> > > > stable,
> > > > > no progress will be seen because there are no commits to update the
> > > > > position.
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <ism...@juma.me.uk>
> > > wrote:
> > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju.han.fe...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Issue created:
> https://issues.apache.org/jira/browse/KAFKA-3146
> > > > > > >
> > > > > > > 2016-01-25 16:07 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > > > > > >
> > > > > > > > Hi Bruno,
> > > > > > > >
> > > > > > > > Can you tell me a little bit more about that? A seek() in the
> > > > > > > > `onPartitionAssigned`?
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > > > > > > >
> > > > > > > >> Ok I'll create a JIRA issue on this.
> > > > > > > >>
> > > > > > > >> Thanks!
> > > > > > > >>
> > > > > > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > > > > > bruno.rassae...@novazone.be
> > > > > > > >:
> > > > > > > >>
> > > > > > > >>> +1 here
> > > > > > > >>>
> > > > > > > >>> As a workaround we seek to the current offset which resets
> > the
> > > > > > current
> > > > > > > >>> clients internal states and everythin

Re: Stuck consumer with new consumer API in 0.9

2016-01-25 Thread Han JU
Hi Bruno,

Can you tell me a little bit more about that? A seek() in the
`onPartitionAssigned`?

Thanks.

2016-01-25 10:51 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:

> Ok I'll create a JIRA issue on this.
>
> Thanks!
>
> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <bruno.rassae...@novazone.be>:
>
>> +1 here
>>
>> As a workaround we seek to the current offset which resets the current
>> clients internal states and everything continues.
>>
>> Regards,
>> Bruno Rassaerts | Freelance Java Developer
>>
>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
>> bruno.rassae...@novazone.be -www.novazone.be
>>
>> > On 23 Jan 2016, at 17:52, Ismael Juma <ism...@juma.me.uk> wrote:
>> >
>> > Hi,
>> >
>> > Can you please file an issue in JIRA so that we make sure this is
>> > investigated?
>> >
>> > Ismael
>> >
>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju.han.fe...@gmail.com>
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
>> particularly
>> >> interested in the `ConsumerRebalanceListener`.
>> >>
>> >> My test setup is like the following:
>> >>  - 5M messages pre-loaded in one node kafka 0.9
>> >>  - 12 partitions, auto offset commit set to false
>> >>  - in `onPartitionsRevoked`, commit offset and flush the local state
>> >>
>> >> The test run is like the following:
>> >>  - launch one process with 2 consumers and let it consume for a while
>> >>  - launch another process with 2 consumers, this triggers a
>> rebalancing,
>> >> and let these 2 processes run until messages are all consumed
>> >>
>> >> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
>> >>
>> >> So at first, the 2 consumers of the first process each got 6
>> partitions.
>> >> And after the rebalancing, each consumer got 3 partitions. It's
>> confirmed
>> >> by logging inside the `onPartitionAssigned` callback.
>> >>
>> >> But after the rebalancing, one of the 2 consumers of the first process
>> stop
>> >> receiving messages, even if it has partitions assigned to:
>> >>
>> >> balance-1 pulled 7237 msgs ...
>> >> balance-0 pulled 7263 msgs ...
>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>> the
>> >> group is rebalancing, try to re-join group.
>> >> balance-1 flush @ 536637
>> >> balance-1 committed offset for List(balance-11, balance-10, balance-9,
>> >> balance-8, balance-7, balance-6)
>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>> the
>> >> group is rebalancing, try to re-join group.
>> >> balance-0 flush @ 543845
>> >> balance-0 committed offset for List(balance-5, balance-4, balance-3,
>> >> balance-2, balance-1, balance-0)
>> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
>> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
>> >> balance-1 pulled 3625 msgs ...
>> >> balance-0 pulled 3621 msgs ...
>> >> balance-0 pulled 3631 msgs ...
>> >> balance-0 pulled 3631 msgs ...
>> >> balance-1 pulled 0 msgs ...
>> >> balance-0 pulled 3643 msgs ...
>> >> balance-0 pulled 3643 msgs ...
>> >> balance-1 pulled 0 msgs ...
>> >> balance-0 pulled 3622 msgs ...
>> >> balance-0 pulled 3632 msgs ...
>> >> balance-1 pulled 0 msgs ...
>> >> balance-0 pulled 3637 msgs ...
>> >> balance-0 pulled 3641 msgs ...
>> >> balance-0 pulled 3640 msgs ...
>> >> balance-1 pulled 0 msgs ...
>> >> balance-0 pulled 3632 msgs ...
>> >> balance-0 pulled 3630 msgs ...
>> >> balance-1 pulled 0 msgs ...
>> >> ..
>> >>
>> >> `balance-0` and `balance-1` are the names of the consumer thread. So
>> after
>> >> the rebalancing, thread `balance-1` continues to poll but no message
>> >> arrive, given that it has got 3 partitions assigned to after the
>> >> rebalancing.
>> >>
>> >> Finally other 3 consumers pulls all t

Re: Stuck consumer with new consumer API in 0.9

2016-01-25 Thread Rajiv Kurian
Hi Jason,

Was this a server bug or a client bug?

Thanks,
Rajiv

On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Apologies for the late arrival to this thread. There was a bug in the
> 0.9.0.0 release of Kafka which could cause the consumer to stop fetching
> from a partition after a rebalance. If you're seeing this, please checkout
> the 0.9.0 branch of Kafka and see if you can reproduce this problem. If you
> can, then it would be really helpful if you file a JIRA with the steps to
> reproduce.
>
> From Han's initial example, it kind of looks like the problem might be in
> the usage. The consumer lag as shown by the kafka-consumer-groups script
> relies on the last committed position to determine lag. To update progress,
> you need to commit offsets regularly. In the gist, offsets are only
> committed on shutdown or when a rebalance occurs. When the group is stable,
> no progress will be seen because there are no commits to update the
> position.
>
> Thanks,
> Jason
>
> On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Thanks!
> >
> > Ismael
> >
> > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju.han.fe...@gmail.com> wrote:
> >
> > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > >
> > > 2016-01-25 16:07 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > >
> > > > Hi Bruno,
> > > >
> > > > Can you tell me a little bit more about that? A seek() in the
> > > > `onPartitionAssigned`?
> > > >
> > > > Thanks.
> > > >
> > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > > >
> > > >> Ok I'll create a JIRA issue on this.
> > > >>
> > > >> Thanks!
> > > >>
> > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > bruno.rassae...@novazone.be
> > > >:
> > > >>
> > > >>> +1 here
> > > >>>
> > > >>> As a workaround we seek to the current offset which resets the
> > current
> > > >>> clients internal states and everything continues.
> > > >>>
> > > >>> Regards,
> > > >>> Bruno Rassaerts | Freelance Java Developer
> > > >>>
> > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > >>> bruno.rassae...@novazone.be -www.novazone.be
> > > >>>
> > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <ism...@juma.me.uk> wrote:
> > > >>> >
> > > >>> > Hi,
> > > >>> >
> > > >>> > Can you please file an issue in JIRA so that we make sure this is
> > > >>> > investigated?
> > > >>> >
> > > >>> > Ismael
> > > >>> >
> > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju.han.fe...@gmail.com
> >
> > > >>> wrote:
> > > >>> >>
> > > >>> >> Hi,
> > > >>> >>
> > > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> > > >>> particularly
> > > >>> >> interested in the `ConsumerRebalanceListener`.
> > > >>> >>
> > > >>> >> My test setup is like the following:
> > > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > > >>> >>  - 12 partitions, auto offset commit set to false
> > > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the local
> > state
> > > >>> >>
> > > >>> >> The test run is like the following:
> > > >>> >>  - launch one process with 2 consumers and let it consume for a
> > > while
> > > >>> >>  - launch another process with 2 consumers, this triggers a
> > > >>> rebalancing,
> > > >>> >> and let these 2 processes run until messages are all consumed
> > > >>> >>
> > > >>> >> The code is here:
> > > https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> > > >>> >>
> > > >>> >> So at first, the 2 consumers of the first process each got 6
> > > >>> partitions.
> > > >&

Re: Stuck consumer with new consumer API in 0.9

2016-01-25 Thread Jason Gustafson
Apologies for the late arrival to this thread. There was a bug in the
0.9.0.0 release of Kafka which could cause the consumer to stop fetching
from a partition after a rebalance. If you're seeing this, please checkout
the 0.9.0 branch of Kafka and see if you can reproduce this problem. If you
can, then it would be really helpful if you file a JIRA with the steps to
reproduce.

>From Han's initial example, it kind of looks like the problem might be in
the usage. The consumer lag as shown by the kafka-consumer-groups script
relies on the last committed position to determine lag. To update progress,
you need to commit offsets regularly. In the gist, offsets are only
committed on shutdown or when a rebalance occurs. When the group is stable,
no progress will be seen because there are no commits to update the
position.

Thanks,
Jason

On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Thanks!
>
> Ismael
>
> On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju.han.fe...@gmail.com> wrote:
>
> > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> >
> > 2016-01-25 16:07 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> >
> > > Hi Bruno,
> > >
> > > Can you tell me a little bit more about that? A seek() in the
> > > `onPartitionAssigned`?
> > >
> > > Thanks.
> > >
> > > 2016-01-25 10:51 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > >
> > >> Ok I'll create a JIRA issue on this.
> > >>
> > >> Thanks!
> > >>
> > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> bruno.rassae...@novazone.be
> > >:
> > >>
> > >>> +1 here
> > >>>
> > >>> As a workaround we seek to the current offset which resets the
> current
> > >>> clients internal states and everything continues.
> > >>>
> > >>> Regards,
> > >>> Bruno Rassaerts | Freelance Java Developer
> > >>>
> > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > >>> bruno.rassae...@novazone.be -www.novazone.be
> > >>>
> > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <ism...@juma.me.uk> wrote:
> > >>> >
> > >>> > Hi,
> > >>> >
> > >>> > Can you please file an issue in JIRA so that we make sure this is
> > >>> > investigated?
> > >>> >
> > >>> > Ismael
> > >>> >
> > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju.han.fe...@gmail.com>
> > >>> wrote:
> > >>> >>
> > >>> >> Hi,
> > >>> >>
> > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> > >>> particularly
> > >>> >> interested in the `ConsumerRebalanceListener`.
> > >>> >>
> > >>> >> My test setup is like the following:
> > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > >>> >>  - 12 partitions, auto offset commit set to false
> > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the local
> state
> > >>> >>
> > >>> >> The test run is like the following:
> > >>> >>  - launch one process with 2 consumers and let it consume for a
> > while
> > >>> >>  - launch another process with 2 consumers, this triggers a
> > >>> rebalancing,
> > >>> >> and let these 2 processes run until messages are all consumed
> > >>> >>
> > >>> >> The code is here:
> > https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> > >>> >>
> > >>> >> So at first, the 2 consumers of the first process each got 6
> > >>> partitions.
> > >>> >> And after the rebalancing, each consumer got 3 partitions. It's
> > >>> confirmed
> > >>> >> by logging inside the `onPartitionAssigned` callback.
> > >>> >>
> > >>> >> But after the rebalancing, one of the 2 consumers of the first
> > >>> process stop
> > >>> >> receiving messages, even if it has partitions assigned to:
> > >>> >>
> > >>> >> balance-1 pulled 7237 msgs ...
> > >>> >> balance-0 pulled 7263 msgs ...
> > >

Re: Stuck consumer with new consumer API in 0.9

2016-01-25 Thread Guozhang Wang
Han,

>From your logs it seems the thread which cannot fetch more data is
rebalance-1, which is assigned with partitions [balance-11, balance-10,
balance-9];

>From your consumer-group command the partitions that is lagging are [balance-6,
balance-7, balance-8] which is not assigned to this process, and [11, 10,
9] are all caught up.

So I'm a little confused here, since it seems reasonable for rebalance-1 to
fetch no data as all has been consumed, and [6,7,8] are indeed lagging but
not clear why as we cannot see the logs from the other process.

BTW, the bug that Jason mentioned is KAFKA-2978
<https://issues.apache.org/jira/browse/KAFKA-2978>, and is already merged
into 0.9.0 branch, so if you still see this in 0.9.0 branch, it should not
be the cause of your issue.

Guozhang


On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Apologies for the late arrival to this thread. There was a bug in the
> 0.9.0.0 release of Kafka which could cause the consumer to stop fetching
> from a partition after a rebalance. If you're seeing this, please checkout
> the 0.9.0 branch of Kafka and see if you can reproduce this problem. If you
> can, then it would be really helpful if you file a JIRA with the steps to
> reproduce.
>
> From Han's initial example, it kind of looks like the problem might be in
> the usage. The consumer lag as shown by the kafka-consumer-groups script
> relies on the last committed position to determine lag. To update progress,
> you need to commit offsets regularly. In the gist, offsets are only
> committed on shutdown or when a rebalance occurs. When the group is stable,
> no progress will be seen because there are no commits to update the
> position.
>
> Thanks,
> Jason
>
> On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Thanks!
> >
> > Ismael
> >
> > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju.han.fe...@gmail.com> wrote:
> >
> > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > >
> > > 2016-01-25 16:07 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > >
> > > > Hi Bruno,
> > > >
> > > > Can you tell me a little bit more about that? A seek() in the
> > > > `onPartitionAssigned`?
> > > >
> > > > Thanks.
> > > >
> > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > > >
> > > >> Ok I'll create a JIRA issue on this.
> > > >>
> > > >> Thanks!
> > > >>
> > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > bruno.rassae...@novazone.be
> > > >:
> > > >>
> > > >>> +1 here
> > > >>>
> > > >>> As a workaround we seek to the current offset which resets the
> > current
> > > >>> clients internal states and everything continues.
> > > >>>
> > > >>> Regards,
> > > >>> Bruno Rassaerts | Freelance Java Developer
> > > >>>
> > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > >>> bruno.rassae...@novazone.be -www.novazone.be
> > > >>>
> > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <ism...@juma.me.uk> wrote:
> > > >>> >
> > > >>> > Hi,
> > > >>> >
> > > >>> > Can you please file an issue in JIRA so that we make sure this is
> > > >>> > investigated?
> > > >>> >
> > > >>> > Ismael
> > > >>> >
> > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju.han.fe...@gmail.com
> >
> > > >>> wrote:
> > > >>> >>
> > > >>> >> Hi,
> > > >>> >>
> > > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> > > >>> particularly
> > > >>> >> interested in the `ConsumerRebalanceListener`.
> > > >>> >>
> > > >>> >> My test setup is like the following:
> > > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > > >>> >>  - 12 partitions, auto offset commit set to false
> > > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the local
> > state
> > > >>> >>
> > > >>> >> 

Re: Stuck consumer with new consumer API in 0.9

2016-01-25 Thread Rajiv Kurian
Thanks Jason. We are using an affected client I guess.

Is there a 0.9.0 client available on maven? My search at
http://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 only shows
the 0.9.0.0 client which seems to have this issue.


Thanks,
Rajiv

On Mon, Jan 25, 2016 at 11:56 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Hey Rajiv, the bug was on the client. Here's a link to the JIRA:
> https://issues.apache.org/jira/browse/KAFKA-2978.
>
> -Jason
>
> On Mon, Jan 25, 2016 at 11:42 AM, Rajiv Kurian <ra...@signalfx.com> wrote:
>
> > Hi Jason,
> >
> > Was this a server bug or a client bug?
> >
> > Thanks,
> > Rajiv
> >
> > On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Apologies for the late arrival to this thread. There was a bug in the
> > > 0.9.0.0 release of Kafka which could cause the consumer to stop
> fetching
> > > from a partition after a rebalance. If you're seeing this, please
> > checkout
> > > the 0.9.0 branch of Kafka and see if you can reproduce this problem. If
> > you
> > > can, then it would be really helpful if you file a JIRA with the steps
> to
> > > reproduce.
> > >
> > > From Han's initial example, it kind of looks like the problem might be
> in
> > > the usage. The consumer lag as shown by the kafka-consumer-groups
> script
> > > relies on the last committed position to determine lag. To update
> > progress,
> > > you need to commit offsets regularly. In the gist, offsets are only
> > > committed on shutdown or when a rebalance occurs. When the group is
> > stable,
> > > no progress will be seen because there are no commits to update the
> > > position.
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <ism...@juma.me.uk>
> wrote:
> > >
> > > > Thanks!
> > > >
> > > > Ismael
> > > >
> > > > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju.han.fe...@gmail.com>
> > wrote:
> > > >
> > > > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > > > >
> > > > > 2016-01-25 16:07 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > > > >
> > > > > > Hi Bruno,
> > > > > >
> > > > > > Can you tell me a little bit more about that? A seek() in the
> > > > > > `onPartitionAssigned`?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > > > > >
> > > > > >> Ok I'll create a JIRA issue on this.
> > > > > >>
> > > > > >> Thanks!
> > > > > >>
> > > > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > > > bruno.rassae...@novazone.be
> > > > > >:
> > > > > >>
> > > > > >>> +1 here
> > > > > >>>
> > > > > >>> As a workaround we seek to the current offset which resets the
> > > > current
> > > > > >>> clients internal states and everything continues.
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Bruno Rassaerts | Freelance Java Developer
> > > > > >>>
> > > > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > > > >>> bruno.rassae...@novazone.be -www.novazone.be
> > > > > >>>
> > > > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <ism...@juma.me.uk>
> > wrote:
> > > > > >>> >
> > > > > >>> > Hi,
> > > > > >>> >
> > > > > >>> > Can you please file an issue in JIRA so that we make sure
> this
> > is
> > > > > >>> > investigated?
> > > > > >>> >
> > > > > >>> > Ismael
> > > > > >>> >
> > > > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <
> > ju.han.fe...@gmail.com
> > > >
> > > > > >>> wrote:
> > > > > >>> >>
> > 

Re: Stuck consumer with new consumer API in 0.9

2016-01-25 Thread Ismael Juma
Thanks!

Ismael

On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju.han.fe...@gmail.com> wrote:

> Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
>
> 2016-01-25 16:07 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
>
> > Hi Bruno,
> >
> > Can you tell me a little bit more about that? A seek() in the
> > `onPartitionAssigned`?
> >
> > Thanks.
> >
> > 2016-01-25 10:51 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> >
> >> Ok I'll create a JIRA issue on this.
> >>
> >> Thanks!
> >>
> >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <bruno.rassae...@novazone.be
> >:
> >>
> >>> +1 here
> >>>
> >>> As a workaround we seek to the current offset which resets the current
> >>> clients internal states and everything continues.
> >>>
> >>> Regards,
> >>> Bruno Rassaerts | Freelance Java Developer
> >>>
> >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> >>> bruno.rassae...@novazone.be -www.novazone.be
> >>>
> >>> > On 23 Jan 2016, at 17:52, Ismael Juma <ism...@juma.me.uk> wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > Can you please file an issue in JIRA so that we make sure this is
> >>> > investigated?
> >>> >
> >>> > Ismael
> >>> >
> >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju.han.fe...@gmail.com>
> >>> wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> >>> particularly
> >>> >> interested in the `ConsumerRebalanceListener`.
> >>> >>
> >>> >> My test setup is like the following:
> >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> >>> >>  - 12 partitions, auto offset commit set to false
> >>> >>  - in `onPartitionsRevoked`, commit offset and flush the local state
> >>> >>
> >>> >> The test run is like the following:
> >>> >>  - launch one process with 2 consumers and let it consume for a
> while
> >>> >>  - launch another process with 2 consumers, this triggers a
> >>> rebalancing,
> >>> >> and let these 2 processes run until messages are all consumed
> >>> >>
> >>> >> The code is here:
> https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> >>> >>
> >>> >> So at first, the 2 consumers of the first process each got 6
> >>> partitions.
> >>> >> And after the rebalancing, each consumer got 3 partitions. It's
> >>> confirmed
> >>> >> by logging inside the `onPartitionAssigned` callback.
> >>> >>
> >>> >> But after the rebalancing, one of the 2 consumers of the first
> >>> process stop
> >>> >> receiving messages, even if it has partitions assigned to:
> >>> >>
> >>> >> balance-1 pulled 7237 msgs ...
> >>> >> balance-0 pulled 7263 msgs ...
> >>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
> >>> the
> >>> >> group is rebalancing, try to re-join group.
> >>> >> balance-1 flush @ 536637
> >>> >> balance-1 committed offset for List(balance-11, balance-10,
> balance-9,
> >>> >> balance-8, balance-7, balance-6)
> >>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
> >>> the
> >>> >> group is rebalancing, try to re-join group.
> >>> >> balance-0 flush @ 543845
> >>> >> balance-0 committed offset for List(balance-5, balance-4, balance-3,
> >>> >> balance-2, balance-1, balance-0)
> >>> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
> >>> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
> >>> >> balance-1 pulled 3625 msgs ...
> >>> >> balance-0 pulled 3621 msgs ...
> >>> >> balance-0 pulled 3631 msgs ...
> >>> >> balance-0 pulled 3631 msgs ...
> >>

Re: Stuck consumer with new consumer API in 0.9

2016-01-25 Thread Jason Gustafson
Hey Rajiv, the bug was on the client. Here's a link to the JIRA:
https://issues.apache.org/jira/browse/KAFKA-2978.

-Jason

On Mon, Jan 25, 2016 at 11:42 AM, Rajiv Kurian <ra...@signalfx.com> wrote:

> Hi Jason,
>
> Was this a server bug or a client bug?
>
> Thanks,
> Rajiv
>
> On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Apologies for the late arrival to this thread. There was a bug in the
> > 0.9.0.0 release of Kafka which could cause the consumer to stop fetching
> > from a partition after a rebalance. If you're seeing this, please
> checkout
> > the 0.9.0 branch of Kafka and see if you can reproduce this problem. If
> you
> > can, then it would be really helpful if you file a JIRA with the steps to
> > reproduce.
> >
> > From Han's initial example, it kind of looks like the problem might be in
> > the usage. The consumer lag as shown by the kafka-consumer-groups script
> > relies on the last committed position to determine lag. To update
> progress,
> > you need to commit offsets regularly. In the gist, offsets are only
> > committed on shutdown or when a rebalance occurs. When the group is
> stable,
> > no progress will be seen because there are no commits to update the
> > position.
> >
> > Thanks,
> > Jason
> >
> > On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Thanks!
> > >
> > > Ismael
> > >
> > > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju.han.fe...@gmail.com>
> wrote:
> > >
> > > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > > >
> > > > 2016-01-25 16:07 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > > >
> > > > > Hi Bruno,
> > > > >
> > > > > Can you tell me a little bit more about that? A seek() in the
> > > > > `onPartitionAssigned`?
> > > > >
> > > > > Thanks.
> > > > >
> > > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
> > > > >
> > > > >> Ok I'll create a JIRA issue on this.
> > > > >>
> > > > >> Thanks!
> > > > >>
> > > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > > bruno.rassae...@novazone.be
> > > > >:
> > > > >>
> > > > >>> +1 here
> > > > >>>
> > > > >>> As a workaround we seek to the current offset which resets the
> > > current
> > > > >>> clients internal states and everything continues.
> > > > >>>
> > > > >>> Regards,
> > > > >>> Bruno Rassaerts | Freelance Java Developer
> > > > >>>
> > > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > > >>> bruno.rassae...@novazone.be -www.novazone.be
> > > > >>>
> > > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <ism...@juma.me.uk>
> wrote:
> > > > >>> >
> > > > >>> > Hi,
> > > > >>> >
> > > > >>> > Can you please file an issue in JIRA so that we make sure this
> is
> > > > >>> > investigated?
> > > > >>> >
> > > > >>> > Ismael
> > > > >>> >
> > > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <
> ju.han.fe...@gmail.com
> > >
> > > > >>> wrote:
> > > > >>> >>
> > > > >>> >> Hi,
> > > > >>> >>
> > > > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> > > > >>> particularly
> > > > >>> >> interested in the `ConsumerRebalanceListener`.
> > > > >>> >>
> > > > >>> >> My test setup is like the following:
> > > > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > > > >>> >>  - 12 partitions, auto offset commit set to false
> > > > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the local
> > > state
> > > > >>> >>
> > > > >>> >&

Re: Stuck consumer with new consumer API in 0.9

2016-01-25 Thread Han JU
Issue created: https://issues.apache.org/jira/browse/KAFKA-3146

2016-01-25 16:07 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:

> Hi Bruno,
>
> Can you tell me a little bit more about that? A seek() in the
> `onPartitionAssigned`?
>
> Thanks.
>
> 2016-01-25 10:51 GMT+01:00 Han JU <ju.han.fe...@gmail.com>:
>
>> Ok I'll create a JIRA issue on this.
>>
>> Thanks!
>>
>> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <bruno.rassae...@novazone.be>:
>>
>>> +1 here
>>>
>>> As a workaround we seek to the current offset which resets the current
>>> clients internal states and everything continues.
>>>
>>> Regards,
>>> Bruno Rassaerts | Freelance Java Developer
>>>
>>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
>>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
>>> bruno.rassae...@novazone.be -www.novazone.be
>>>
>>> > On 23 Jan 2016, at 17:52, Ismael Juma <ism...@juma.me.uk> wrote:
>>> >
>>> > Hi,
>>> >
>>> > Can you please file an issue in JIRA so that we make sure this is
>>> > investigated?
>>> >
>>> > Ismael
>>> >
>>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju.han.fe...@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
>>> particularly
>>> >> interested in the `ConsumerRebalanceListener`.
>>> >>
>>> >> My test setup is like the following:
>>> >>  - 5M messages pre-loaded in one node kafka 0.9
>>> >>  - 12 partitions, auto offset commit set to false
>>> >>  - in `onPartitionsRevoked`, commit offset and flush the local state
>>> >>
>>> >> The test run is like the following:
>>> >>  - launch one process with 2 consumers and let it consume for a while
>>> >>  - launch another process with 2 consumers, this triggers a
>>> rebalancing,
>>> >> and let these 2 processes run until messages are all consumed
>>> >>
>>> >> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
>>> >>
>>> >> So at first, the 2 consumers of the first process each got 6
>>> partitions.
>>> >> And after the rebalancing, each consumer got 3 partitions. It's
>>> confirmed
>>> >> by logging inside the `onPartitionAssigned` callback.
>>> >>
>>> >> But after the rebalancing, one of the 2 consumers of the first
>>> process stop
>>> >> receiving messages, even if it has partitions assigned to:
>>> >>
>>> >> balance-1 pulled 7237 msgs ...
>>> >> balance-0 pulled 7263 msgs ...
>>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
>>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>>> the
>>> >> group is rebalancing, try to re-join group.
>>> >> balance-1 flush @ 536637
>>> >> balance-1 committed offset for List(balance-11, balance-10, balance-9,
>>> >> balance-8, balance-7, balance-6)
>>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
>>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>>> the
>>> >> group is rebalancing, try to re-join group.
>>> >> balance-0 flush @ 543845
>>> >> balance-0 committed offset for List(balance-5, balance-4, balance-3,
>>> >> balance-2, balance-1, balance-0)
>>> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
>>> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
>>> >> balance-1 pulled 3625 msgs ...
>>> >> balance-0 pulled 3621 msgs ...
>>> >> balance-0 pulled 3631 msgs ...
>>> >> balance-0 pulled 3631 msgs ...
>>> >> balance-1 pulled 0 msgs ...
>>> >> balance-0 pulled 3643 msgs ...
>>> >> balance-0 pulled 3643 msgs ...
>>> >> balance-1 pulled 0 msgs ...
>>> >> balance-0 pulled 3622 msgs ...
>>> >> balance-0 pulled 3632 msgs ...
>>> >> balance-1 pulled 0 msgs ...
>>> >> balance-0 pulled 3637 msgs ...
>>> >> balance-0 pulled 3641 msgs ...
>>> >> balance-0 pulled 3640 msgs ...
>>> >> balance-1 pulled 0 msgs ...
>>> >> bal

Re: Stuck consumer with new consumer API in 0.9

2016-01-25 Thread Han JU
Ok I'll create a JIRA issue on this.

Thanks!

2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <bruno.rassae...@novazone.be>:

> +1 here
>
> As a workaround we seek to the current offset which resets the current
> clients internal states and everything continues.
>
> Regards,
> Bruno Rassaerts | Freelance Java Developer
>
> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> bruno.rassae...@novazone.be -www.novazone.be
>
> > On 23 Jan 2016, at 17:52, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > Hi,
> >
> > Can you please file an issue in JIRA so that we make sure this is
> > investigated?
> >
> > Ismael
> >
> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju.han.fe...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> particularly
> >> interested in the `ConsumerRebalanceListener`.
> >>
> >> My test setup is like the following:
> >>  - 5M messages pre-loaded in one node kafka 0.9
> >>  - 12 partitions, auto offset commit set to false
> >>  - in `onPartitionsRevoked`, commit offset and flush the local state
> >>
> >> The test run is like the following:
> >>  - launch one process with 2 consumers and let it consume for a while
> >>  - launch another process with 2 consumers, this triggers a rebalancing,
> >> and let these 2 processes run until messages are all consumed
> >>
> >> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> >>
> >> So at first, the 2 consumers of the first process each got 6 partitions.
> >> And after the rebalancing, each consumer got 3 partitions. It's
> confirmed
> >> by logging inside the `onPartitionAssigned` callback.
> >>
> >> But after the rebalancing, one of the 2 consumers of the first process
> stop
> >> receiving messages, even if it has partitions assigned to:
> >>
> >> balance-1 pulled 7237 msgs ...
> >> balance-0 pulled 7263 msgs ...
> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> >> group is rebalancing, try to re-join group.
> >> balance-1 flush @ 536637
> >> balance-1 committed offset for List(balance-11, balance-10, balance-9,
> >> balance-8, balance-7, balance-6)
> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> >> group is rebalancing, try to re-join group.
> >> balance-0 flush @ 543845
> >> balance-0 committed offset for List(balance-5, balance-4, balance-3,
> >> balance-2, balance-1, balance-0)
> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
> >> balance-1 pulled 3625 msgs ...
> >> balance-0 pulled 3621 msgs ...
> >> balance-0 pulled 3631 msgs ...
> >> balance-0 pulled 3631 msgs ...
> >> balance-1 pulled 0 msgs ...
> >> balance-0 pulled 3643 msgs ...
> >> balance-0 pulled 3643 msgs ...
> >> balance-1 pulled 0 msgs ...
> >> balance-0 pulled 3622 msgs ...
> >> balance-0 pulled 3632 msgs ...
> >> balance-1 pulled 0 msgs ...
> >> balance-0 pulled 3637 msgs ...
> >> balance-0 pulled 3641 msgs ...
> >> balance-0 pulled 3640 msgs ...
> >> balance-1 pulled 0 msgs ...
> >> balance-0 pulled 3632 msgs ...
> >> balance-0 pulled 3630 msgs ...
> >> balance-1 pulled 0 msgs ...
> >> ..
> >>
> >> `balance-0` and `balance-1` are the names of the consumer thread. So
> after
> >> the rebalancing, thread `balance-1` continues to poll but no message
> >> arrive, given that it has got 3 partitions assigned to after the
> >> rebalancing.
> >>
> >> Finally other 3 consumers pulls all their partitions' message, the
> >> situation is like
> >>
> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
> >> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/127.0.0.1
> >> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/127.0.0.1
> >> balance-test, balance, 8, 180

Re: Stuck consumer with new consumer API in 0.9

2016-01-23 Thread 何伟昌
+1, facing same issue.

> 在 2016年1月22日,下午11:13,Han JU <ju.han.fe...@gmail.com> 写道:
> 
> Hi,
> 
> I'm prototyping with the new consumer API of kafka 0.9 and I'm particularly
> interested in the `ConsumerRebalanceListener`.
> 
> My test setup is like the following:
>  - 5M messages pre-loaded in one node kafka 0.9
>  - 12 partitions, auto offset commit set to false
>  - in `onPartitionsRevoked`, commit offset and flush the local state
> 
> The test run is like the following:
>  - launch one process with 2 consumers and let it consume for a while
>  - launch another process with 2 consumers, this triggers a rebalancing,
> and let these 2 processes run until messages are all consumed
> 
> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> 
> So at first, the 2 consumers of the first process each got 6 partitions.
> And after the rebalancing, each consumer got 3 partitions. It's confirmed
> by logging inside the `onPartitionAssigned` callback.
> 
> But after the rebalancing, one of the 2 consumers of the first process stop
> receiving messages, even if it has partitions assigned to:
> 
> balance-1 pulled 7237 msgs ...
> balance-0 pulled 7263 msgs ...
> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> group is rebalancing, try to re-join group.
> balance-1 flush @ 536637
> balance-1 committed offset for List(balance-11, balance-10, balance-9,
> balance-8, balance-7, balance-6)
> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> group is rebalancing, try to re-join group.
> balance-0 flush @ 543845
> balance-0 committed offset for List(balance-5, balance-4, balance-3,
> balance-2, balance-1, balance-0)
> balance-0 got assigned List(balance-5, balance-4, balance-3)
> balance-1 got assigned List(balance-11, balance-10, balance-9)
> balance-1 pulled 3625 msgs ...
> balance-0 pulled 3621 msgs ...
> balance-0 pulled 3631 msgs ...
> balance-0 pulled 3631 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3643 msgs ...
> balance-0 pulled 3643 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3622 msgs ...
> balance-0 pulled 3632 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3637 msgs ...
> balance-0 pulled 3641 msgs ...
> balance-0 pulled 3640 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3632 msgs ...
> balance-0 pulled 3630 msgs ...
> balance-1 pulled 0 msgs ...
> ..
> 
> `balance-0` and `balance-1` are the names of the consumer thread. So after
> the rebalancing, thread `balance-1` continues to poll but no message
> arrive, given that it has got 3 partitions assigned to after the
> rebalancing.
> 
> Finally other 3 consumers pulls all their partitions' message, the
> situation is like
> 
> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/127.0.0.1
> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/127.0.0.1
> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/127.0.0.1
> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1
> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1
> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1
> 
> So you can see, partition [6, 7, 8] still has messages, but the consumer
> can't pull them after the rebalancing.
> 
> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both server/broker
> and client.
> 
> I hope the code is clear enough to illustrate/reproduce the problem. It's
> quite a surprise for me because this is the main feature of the new
> consumer API, but it does not seem to work properly.
> Feel free to talk to me for any details.
> -- 
> *JU Han*
> 
> Software Engineer @ Teads.tv
> 
> +33 061960



Re: Stuck consumer with new consumer API in 0.9

2016-01-23 Thread Ismael Juma
Hi,

Can you please file an issue in JIRA so that we make sure this is
investigated?

Ismael

On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju.han.fe...@gmail.com> wrote:

> Hi,
>
> I'm prototyping with the new consumer API of kafka 0.9 and I'm particularly
> interested in the `ConsumerRebalanceListener`.
>
> My test setup is like the following:
>   - 5M messages pre-loaded in one node kafka 0.9
>   - 12 partitions, auto offset commit set to false
>   - in `onPartitionsRevoked`, commit offset and flush the local state
>
> The test run is like the following:
>   - launch one process with 2 consumers and let it consume for a while
>   - launch another process with 2 consumers, this triggers a rebalancing,
> and let these 2 processes run until messages are all consumed
>
> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
>
> So at first, the 2 consumers of the first process each got 6 partitions.
> And after the rebalancing, each consumer got 3 partitions. It's confirmed
> by logging inside the `onPartitionAssigned` callback.
>
> But after the rebalancing, one of the 2 consumers of the first process stop
> receiving messages, even if it has partitions assigned to:
>
> balance-1 pulled 7237 msgs ...
> balance-0 pulled 7263 msgs ...
> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> group is rebalancing, try to re-join group.
> balance-1 flush @ 536637
> balance-1 committed offset for List(balance-11, balance-10, balance-9,
> balance-8, balance-7, balance-6)
> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> group is rebalancing, try to re-join group.
> balance-0 flush @ 543845
> balance-0 committed offset for List(balance-5, balance-4, balance-3,
> balance-2, balance-1, balance-0)
> balance-0 got assigned List(balance-5, balance-4, balance-3)
> balance-1 got assigned List(balance-11, balance-10, balance-9)
> balance-1 pulled 3625 msgs ...
> balance-0 pulled 3621 msgs ...
> balance-0 pulled 3631 msgs ...
> balance-0 pulled 3631 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3643 msgs ...
> balance-0 pulled 3643 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3622 msgs ...
> balance-0 pulled 3632 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3637 msgs ...
> balance-0 pulled 3641 msgs ...
> balance-0 pulled 3640 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3632 msgs ...
> balance-0 pulled 3630 msgs ...
> balance-1 pulled 0 msgs ...
> ..
>
> `balance-0` and `balance-1` are the names of the consumer thread. So after
> the rebalancing, thread `balance-1` continues to poll but no message
> arrive, given that it has got 3 partitions assigned to after the
> rebalancing.
>
> Finally other 3 consumers pulls all their partitions' message, the
> situation is like
>
> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/127.0.0.1
> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/127.0.0.1
> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/127.0.0.1
> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1
> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1
> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1
>
> So you can see, partition [6, 7, 8] still has messages, but the consumer
> can't pull them after the rebalancing.
>
> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both server/broker
> and client.
>
> I hope the code is clear enough to illustrate/reproduce the problem. It's
> quite a surprise for me because this is the main feature of the new
> consumer API, but it does not seem to work properly.
> Feel free to talk to me for any details.
> --
> *JU Han*
>
> Software Engineer @ Teads.tv
>
> +33 061960
>


New Consumer API + Reactive Kafka

2015-12-02 Thread Krzysztof Ciesielski
Hello,

I’m the main maintainer of Reactive Kafka - a wrapper library that provides 
Kafka API as Reactive Streams (https://github.com/softwaremill/reactive-kafka).
I’m a bit concerned about switching to Kafka 0.9 because of the new Consumer 
API which doesn’t seem to fit well into this paradigm, comparing to the old 
one. My main concerns are:

1. Our current code uses the KafkaIterator and reads messages sequentially, 
then sends them further upstream. In the new API, you cannot control how many 
messages are returned with poll(), so we would need to introduce some kind of 
in-memory buffering.
2. You cannot specify which offsets to commit. Our current native committer 
(https://github.com/softwaremill/reactive-kafka/blob/4055e88c09b8e08aefe8dbbd4748605df5779b07/core/src/main/scala/com/softwaremill/react/kafka/commit/native/NativeCommitter.scala)
 uses the OffsetCommitRequest/Response API and 
kafka.api.ConsumerMetadataRequest/Response for resolving brokers. Switching to 
Kafka 0.9 brings some compilation errors that raise questions.

My questions are:

1. Do I understand the capabilities and limitations of new API correctly? :)
2. Can we stay with the old iterator-based client, or is it going to get 
abandoned in future Kafka versions, or discouraged for some reasons?
3. Can we still use the OffsetCommitRequest/Response API to commit messages 
manually? If yes, could someone update this example: 
https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka
 or give me a few hints on how to do this with 0.9?

By the way, we’d like our library to appear on the Ecosystem Wiki, I’m not sure 
how to request that officially :)

— 
Bests,
Chris
SoftwareMill

Re: New Consumer API + Reactive Kafka

2015-12-02 Thread Gwen Shapira
On Wed, Dec 2, 2015 at 10:44 PM, Krzysztof Ciesielski <
krzysztof.ciesiel...@softwaremill.pl> wrote:

> Hello,
>
> I’m the main maintainer of Reactive Kafka - a wrapper library that
> provides Kafka API as Reactive Streams (
> https://github.com/softwaremill/reactive-kafka).
> I’m a bit concerned about switching to Kafka 0.9 because of the new
> Consumer API which doesn’t seem to fit well into this paradigm, comparing
> to the old one. My main concerns are:
>
> 1. Our current code uses the KafkaIterator and reads messages
> sequentially, then sends them further upstream. In the new API, you cannot
> control how many messages are returned with poll(), so we would need to
> introduce some kind of in-memory buffering.
> 2. You cannot specify which offsets to commit. Our current native
> committer (
> https://github.com/softwaremill/reactive-kafka/blob/4055e88c09b8e08aefe8dbbd4748605df5779b07/core/src/main/scala/com/softwaremill/react/kafka/commit/native/NativeCommitter.scala)
> uses the OffsetCommitRequest/Response API and
> kafka.api.ConsumerMetadataRequest/Response for resolving brokers. Switching
> to Kafka 0.9 brings some compilation errors that raise questions.
>
> My questions are:
>
> 1. Do I understand the capabilities and limitations of new API correctly?
> :)
>

The first limitation is correct - poll() may return any number of records
and you need to handle this.
The second is not correct - commitSync() can take a map of TopicPartition
and Offsets, so you would only commit specific offsets of specific
partitions.



> 2. Can we stay with the old iterator-based client, or is it going to get
> abandoned in future Kafka versions, or discouraged for some reasons?
>

It is already a bit behind - only the new client includes support for
secured clusters (authentication and encryption). It will get deprecated in
the future.


> 3. Can we still use the OffsetCommitRequest/Response API to commit
> messages manually? If yes, could someone update this example:
> https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka
>  or
> give me a few hints on how to do this with 0.9?
>

AFAIK, the wire protocol and the API is not going anywhere. Hopefully you
can use the new objects we provide in the clients jar
(org.apache.kafka.common.requests).


>
> By the way, we’d like our library to appear on the Ecosystem Wiki, I’m not
> sure how to request that officially :)
>

Let us know what to write there and where to link :)


>
> —
> Bests,
> Chris
> SoftwareMill


Re: New Consumer API + Reactive Kafka

2015-12-02 Thread Krzysztof Ciesielski
I see, that’s actually a very important point, thanks Jay.
I think that we are very optimistic about updating Reactive Kafka now after 
getting all these details :)
I have one more question: in the new client we only have to call 
commitSync(offsets). This is a ‘void’ method so I suspect that it commits 
atomically?
In our current native committer, we have quite a lot of additional code for 
retries, reconnecting or finding new channel coordinator. I suspect that the 
new API handles it all internally and if commitSync() fails then it means that 
the only thing we can do is kill the consumer and try to create a new one?

— 
Bests,
Chris
SoftwareMill
On 2 December 2015 at 17:42:24, Jay Kreps (j...@confluent.io) wrote:

It's worth noting that both the old and new consumer are identical in the  
number of records fetched at once and this is bounded by the fetch size and  
the number of partitions you subscribe to. The old consumer held these in  
memory internally and waited for you to ask for them, the new consumer  
immediately gives you what it has. Overall, though, the new consumer gives  
much better control over what is being fetched since it only uses memory  
when you call poll(); the old consumer had a background thread doing this  
which would only stop when it filled up a queue of unprocessed  
chunks...this is a lot harder to predict.  

-Jay  

On Wed, Dec 2, 2015 at 7:13 AM, Gwen Shapira <g...@confluent.io> wrote:  

> On Wed, Dec 2, 2015 at 10:44 PM, Krzysztof Ciesielski <  
> krzysztof.ciesiel...@softwaremill.pl> wrote:  
>  
> > Hello,  
> >  
> > I’m the main maintainer of Reactive Kafka - a wrapper library that  
> > provides Kafka API as Reactive Streams (  
> > https://github.com/softwaremill/reactive-kafka).  
> > I’m a bit concerned about switching to Kafka 0.9 because of the new  
> > Consumer API which doesn’t seem to fit well into this paradigm, comparing  
> > to the old one. My main concerns are:  
> >  
> > 1. Our current code uses the KafkaIterator and reads messages  
> > sequentially, then sends them further upstream. In the new API, you  
> cannot  
> > control how many messages are returned with poll(), so we would need to  
> > introduce some kind of in-memory buffering.  
> > 2. You cannot specify which offsets to commit. Our current native  
> > committer (  
> >  
> https://github.com/softwaremill/reactive-kafka/blob/4055e88c09b8e08aefe8dbbd4748605df5779b07/core/src/main/scala/com/softwaremill/react/kafka/commit/native/NativeCommitter.scala
>   
> )  
> > uses the OffsetCommitRequest/Response API and  
> > kafka.api.ConsumerMetadataRequest/Response for resolving brokers.  
> Switching  
> > to Kafka 0.9 brings some compilation errors that raise questions.  
> >  
> > My questions are:  
> >  
> > 1. Do I understand the capabilities and limitations of new API correctly?  
> > :)  
> >  
>  
> The first limitation is correct - poll() may return any number of records  
> and you need to handle this.  
> The second is not correct - commitSync() can take a map of TopicPartition  
> and Offsets, so you would only commit specific offsets of specific  
> partitions.  
>  
>  
>  
> > 2. Can we stay with the old iterator-based client, or is it going to get  
> > abandoned in future Kafka versions, or discouraged for some reasons?  
> >  
>  
> It is already a bit behind - only the new client includes support for  
> secured clusters (authentication and encryption). It will get deprecated in  
> the future.  
>  
>  
> > 3. Can we still use the OffsetCommitRequest/Response API to commit  
> > messages manually? If yes, could someone update this example:  
> >  
> https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka
>   
> or  
> > give me a few hints on how to do this with 0.9?  
> >  
>  
> AFAIK, the wire protocol and the API is not going anywhere. Hopefully you  
> can use the new objects we provide in the clients jar  
> (org.apache.kafka.common.requests).  
>  
>  
> >  
> > By the way, we’d like our library to appear on the Ecosystem Wiki, I’m  
> not  
> > sure how to request that officially :)  
> >  
>  
> Let us know what to write there and where to link :)  
>  
>  
> >  
> > —  
> > Bests,  
> > Chris  
> > SoftwareMill  
>  


Re: New Consumer API + Reactive Kafka

2015-12-02 Thread Jay Kreps
It's worth noting that both the old and new consumer are identical in the
number of records fetched at once and this is bounded by the fetch size and
the number of partitions you subscribe to. The old consumer held these in
memory internally and waited for you to ask for them, the new consumer
immediately gives you what it has. Overall, though, the new consumer gives
much better control over what is being fetched since it only uses memory
when you call poll(); the old consumer had a background thread doing this
which would only stop when it filled up a queue of unprocessed
chunks...this is a lot harder to predict.

-Jay

On Wed, Dec 2, 2015 at 7:13 AM, Gwen Shapira <g...@confluent.io> wrote:

> On Wed, Dec 2, 2015 at 10:44 PM, Krzysztof Ciesielski <
> krzysztof.ciesiel...@softwaremill.pl> wrote:
>
> > Hello,
> >
> > I’m the main maintainer of Reactive Kafka - a wrapper library that
> > provides Kafka API as Reactive Streams (
> > https://github.com/softwaremill/reactive-kafka).
> > I’m a bit concerned about switching to Kafka 0.9 because of the new
> > Consumer API which doesn’t seem to fit well into this paradigm, comparing
> > to the old one. My main concerns are:
> >
> > 1. Our current code uses the KafkaIterator and reads messages
> > sequentially, then sends them further upstream. In the new API, you
> cannot
> > control how many messages are returned with poll(), so we would need to
> > introduce some kind of in-memory buffering.
> > 2. You cannot specify which offsets to commit. Our current native
> > committer (
> >
> https://github.com/softwaremill/reactive-kafka/blob/4055e88c09b8e08aefe8dbbd4748605df5779b07/core/src/main/scala/com/softwaremill/react/kafka/commit/native/NativeCommitter.scala
> )
> > uses the OffsetCommitRequest/Response API and
> > kafka.api.ConsumerMetadataRequest/Response for resolving brokers.
> Switching
> > to Kafka 0.9 brings some compilation errors that raise questions.
> >
> > My questions are:
> >
> > 1. Do I understand the capabilities and limitations of new API correctly?
> > :)
> >
>
> The first limitation is correct - poll() may return any number of records
> and you need to handle this.
> The second is not correct - commitSync() can take a map of TopicPartition
> and Offsets, so you would only commit specific offsets of specific
> partitions.
>
>
>
> > 2. Can we stay with the old iterator-based client, or is it going to get
> > abandoned in future Kafka versions, or discouraged for some reasons?
> >
>
> It is already a bit behind - only the new client includes support for
> secured clusters (authentication and encryption). It will get deprecated in
> the future.
>
>
> > 3. Can we still use the OffsetCommitRequest/Response API to commit
> > messages manually? If yes, could someone update this example:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka
> or
> > give me a few hints on how to do this with 0.9?
> >
>
> AFAIK, the wire protocol and the API is not going anywhere. Hopefully you
> can use the new objects we provide in the clients jar
> (org.apache.kafka.common.requests).
>
>
> >
> > By the way, we’d like our library to appear on the Ecosystem Wiki, I’m
> not
> > sure how to request that officially :)
> >
>
> Let us know what to write there and where to link :)
>
>
> >
> > —
> > Bests,
> > Chris
> > SoftwareMill
>


Re: New Consumer API + Reactive Kafka

2015-12-02 Thread Guozhang Wang
In the new API commitSync() handles retires and reconnecting, and will only
throw an exception if it encounters a non-retriable error (e.g. it is been
told that the partitions it wants to commit no longer belongs to itself) or
a timeout has elapsed. You can find possible exceptions thrown from this
function here (for function commitSync):

http://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

Guozhang


On Wed, Dec 2, 2015 at 8:58 AM, Krzysztof Ciesielski <
krzysztof.ciesiel...@softwaremill.pl> wrote:

> I see, that’s actually a very important point, thanks Jay.
> I think that we are very optimistic about updating Reactive Kafka now
> after getting all these details :)
> I have one more question: in the new client we only have to call
> commitSync(offsets). This is a ‘void’ method so I suspect that it commits
> atomically?
> In our current native committer, we have quite a lot of additional code
> for retries, reconnecting or finding new channel coordinator. I suspect
> that the new API handles it all internally and if commitSync() fails then
> it means that the only thing we can do is kill the consumer and try to
> create a new one?
>
> —
> Bests,
> Chris
> SoftwareMill
> On 2 December 2015 at 17:42:24, Jay Kreps (j...@confluent.io) wrote:
>
> It's worth noting that both the old and new consumer are identical in the
> number of records fetched at once and this is bounded by the fetch size and
> the number of partitions you subscribe to. The old consumer held these in
> memory internally and waited for you to ask for them, the new consumer
> immediately gives you what it has. Overall, though, the new consumer gives
> much better control over what is being fetched since it only uses memory
> when you call poll(); the old consumer had a background thread doing this
> which would only stop when it filled up a queue of unprocessed
> chunks...this is a lot harder to predict.
>
> -Jay
>
> On Wed, Dec 2, 2015 at 7:13 AM, Gwen Shapira <g...@confluent.io> wrote:
>
> > On Wed, Dec 2, 2015 at 10:44 PM, Krzysztof Ciesielski <
> > krzysztof.ciesiel...@softwaremill.pl> wrote:
> >
> > > Hello,
> > >
> > > I’m the main maintainer of Reactive Kafka - a wrapper library that
> > > provides Kafka API as Reactive Streams (
> > > https://github.com/softwaremill/reactive-kafka).
> > > I’m a bit concerned about switching to Kafka 0.9 because of the new
> > > Consumer API which doesn’t seem to fit well into this paradigm,
> comparing
> > > to the old one. My main concerns are:
> > >
> > > 1. Our current code uses the KafkaIterator and reads messages
> > > sequentially, then sends them further upstream. In the new API, you
> > cannot
> > > control how many messages are returned with poll(), so we would need to
> > > introduce some kind of in-memory buffering.
> > > 2. You cannot specify which offsets to commit. Our current native
> > > committer (
> > >
> >
> https://github.com/softwaremill/reactive-kafka/blob/4055e88c09b8e08aefe8dbbd4748605df5779b07/core/src/main/scala/com/softwaremill/react/kafka/commit/native/NativeCommitter.scala
> > )
> > > uses the OffsetCommitRequest/Response API and
> > > kafka.api.ConsumerMetadataRequest/Response for resolving brokers.
> > Switching
> > > to Kafka 0.9 brings some compilation errors that raise questions.
> > >
> > > My questions are:
> > >
> > > 1. Do I understand the capabilities and limitations of new API
> correctly?
> > > :)
> > >
> >
> > The first limitation is correct - poll() may return any number of records
> > and you need to handle this.
> > The second is not correct - commitSync() can take a map of TopicPartition
> > and Offsets, so you would only commit specific offsets of specific
> > partitions.
> >
> >
> >
> > > 2. Can we stay with the old iterator-based client, or is it going to
> get
> > > abandoned in future Kafka versions, or discouraged for some reasons?
> > >
> >
> > It is already a bit behind - only the new client includes support for
> > secured clusters (authentication and encryption). It will get deprecated
> in
> > the future.
> >
> >
> > > 3. Can we still use the OffsetCommitRequest/Response API to commit
> > > messages manually? If yes, could someone update this example:
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka
> > or
> > > give me a few hints on how to do this with 0.9?
> > >
> >
> > AFAIK, the wire protocol and the API is not going anywhere. Hopefully you
> > can use the new objects we provide in the clients jar
> > (org.apache.kafka.common.requests).
> >
> >
> > >
> > > By the way, we’d like our library to appear on the Ecosystem Wiki, I’m
> > not
> > > sure how to request that officially :)
> > >
> >
> > Let us know what to write there and where to link :)
> >
> >
> > >
> > > —
> > > Bests,
> > > Chris
> > > SoftwareMill
> >
>



-- 
-- Guozhang


Is 0.9 new consumer API compatible with 0.8.x.x broker

2015-11-30 Thread hsy...@gmail.com
Is 0.9 new consumer API compatible with 0.8.x.x broker


Re: Is 0.9 new consumer API compatible with 0.8.x.x broker

2015-11-30 Thread Wang, Howard
Thanks.

I just found the new KafkaConsumer does have two API functions
assignment() and
committed(TopicPartition partition). With these 2 functions, we¹ll be able
to retrieve the timestamp of last offset regardless whether offset storage
is using ZK or offset topic.

Howard



-- 
 Howard Wang
Engineering ­ Big Data and Personalization
Washington Post Media


1150 15th St NW, Washington, DC 20071
p. 202-334-9195 
Email: howard.w...@washpost.com





On 11/30/15, 3:55 PM, "hsy...@gmail.com" <hsy...@gmail.com> wrote:

>Is 0.9 new consumer API compatible with 0.8.x.x broker



Re: Is 0.9 new consumer API compatible with 0.8.x.x broker

2015-11-30 Thread Guozhang Wang
Siyuan,

In general 0.9 new consumer API relies on the group coordinator on the
broker side to manage consumer groups, so you would need to upgrade the
brokers first.

However if you are only using the assign() function to assign partitions,
i.e. no subscribe() which will not need the group coordinator, the consumer
should work with an older broker version.

Guozhang

On Mon, Nov 30, 2015 at 12:58 PM, Wang, Howard <howard.w...@washpost.com>
wrote:

> Thanks.
>
> I just found the new KafkaConsumer does have two API functions
> assignment() and
> committed(TopicPartition partition). With these 2 functions, we¹ll be able
> to retrieve the timestamp of last offset regardless whether offset storage
> is using ZK or offset topic.
>
> Howard
>
>
>
> --
>  Howard Wang
> Engineering ­ Big Data and Personalization
> Washington Post Media
>
>
> 1150 15th St NW, Washington, DC 20071
> p. 202-334-9195
> Email: howard.w...@washpost.com
>
>
>
>
>
> On 11/30/15, 3:55 PM, "hsy...@gmail.com" <hsy...@gmail.com> wrote:
>
> >Is 0.9 new consumer API compatible with 0.8.x.x broker
>
>


-- 
-- Guozhang


Re: Questions about new consumer API

2015-11-18 Thread hsy...@gmail.com
strategy settings) can not be used together?
> > > >
> > > > Also in the old API we found one thread per broker is the most
> > efficient
> > > > way to consume data, for example, if one process consumes from p1,
> p2,
> > p3
> > > > and p1,p2 are sitting on one broker b1, p3 is sitting on another one
> > b2,
> > > > the best thing is create 2 threads each thread use simple consumer
> API
> > > and
> > > > only consume from one broker.  I'm thinking how do I use the new API
> to
> > > do
> > > > this.
> > > >
> > > > Thanks,
> > > > Siyuan
> > > >
> > > > On Mon, Nov 16, 2015 at 4:43 PM, Guozhang Wang <wangg...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Siyuan,
> > > > >
> > > > > 1) new consumer is single-threaded, it does not maintain any
> internal
> > > > > threads as the old high-level consumer.
> > > > >
> > > > > 2) each consumer will only maintain one TCP connection with each
> > > broker.
> > > > > The only extra socket is the one with its coordinator. That is, if
> > > there
> > > > is
> > > > > three brokers S1, S2, S3, and S1 is the coordinator for this
> > consumer,
> > > it
> > > > > will maintain 4 sockets in total, 2 for S1 (one for fetching, one
> for
> > > > > coordinating) and 1 for S2 and S3 (only for fetching).
> > > > >
> > > > > 3) Currently the connection is not closed by consumer, although the
> > > > > underlying network client / selector will close idle ones after
> some
> > > > > timeout. So in worst case it will only maintain N+1 sockets in
> total
> > > for
> > > > N
> > > > > Kafka brokers at one time.
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Mon, Nov 16, 2015 at 4:22 PM, hsy...@gmail.com <
> hsy...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > The new consumer API looks good. If I understand it correctly you
> > can
> > > > use
> > > > > > it like simple consumer or high-level consumer. But I have couple
> > > > > questions
> > > > > > about it's internal implementation
> > > > > >
> > > > > > First of all does the consumer have any internal fetcher threads
> > like
> > > > > > high-level consumer?
> > > > > >
> > > > > > When you assign multiple TopicPartitions to a consumer, how many
> > TCP
> > > > > > connections it establish to the brokers. Is it same as number of
> > > leader
> > > > > > brokers that host those partitions or just number of
> > TopicPartitions.
> > > > If
> > > > > > there is any leader broker change does it establish new
> > > > connections/using
> > > > > > existing connections to fetch the data? Can it continue
> consuming?
> > > Also
> > > > > is
> > > > > > the connection kept until the consumer is closed?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Best,
> > > > > > Siyuan
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> >
>


Re: Questions about new consumer API

2015-11-17 Thread hsy...@gmail.com
Thanks Guozhang,

Maybe I should give a few words about what I'm going to achieve with new API

Currently, I'm building a new kafka connector for Apache Apex(
http://apex.incubator.apache.org/) using 0.9.0 API
Apex support dynamic partition, so in the old version, We manage all the
consumer partitions in either 1:1 strategy (each consumer process consumes
only from one kafka partition) or 1:n strategy (each consumer process could
consume from multiple kafka partitions, using round-robin to distribute)
And we also have separate thread to monitor topic metadata change(leader
broker change, new partition added, using internal API like ZkUtil etc)
and do dynamic partition based on that(for example auto-reconnect to new
leader broker, create new partition to consume from new kafka partition at
runtime).  You can see High-level consumer doesn't work(It can only balance
between existing consumers unless you manually add new one)  I'm thinking
if the new consumer could be used to save some work we did before.

I'm still confused with assign() and subscribe().  My understanding is if
you use assign() only, the consumer becomes more like a simple consumer
except if the leader broker changes it automatically reconnect to the new
leader broker, is it correct?   If you use subscribe() method only then all
the partitions will be distributed to running consumer process with same "
group.id" using "partition.assignment.strategy". Is it true?

So I assume assign() and subscribe()(and group.id
partition.assignment.strategy settings) can not be used together?

Also in the old API we found one thread per broker is the most efficient
way to consume data, for example, if one process consumes from p1, p2, p3
and p1,p2 are sitting on one broker b1, p3 is sitting on another one b2,
the best thing is create 2 threads each thread use simple consumer API and
only consume from one broker.  I'm thinking how do I use the new API to do
this.

Thanks,
Siyuan

On Mon, Nov 16, 2015 at 4:43 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> Hi Siyuan,
>
> 1) new consumer is single-threaded, it does not maintain any internal
> threads as the old high-level consumer.
>
> 2) each consumer will only maintain one TCP connection with each broker.
> The only extra socket is the one with its coordinator. That is, if there is
> three brokers S1, S2, S3, and S1 is the coordinator for this consumer, it
> will maintain 4 sockets in total, 2 for S1 (one for fetching, one for
> coordinating) and 1 for S2 and S3 (only for fetching).
>
> 3) Currently the connection is not closed by consumer, although the
> underlying network client / selector will close idle ones after some
> timeout. So in worst case it will only maintain N+1 sockets in total for N
> Kafka brokers at one time.
>
> Guozhang
>
> On Mon, Nov 16, 2015 at 4:22 PM, hsy...@gmail.com <hsy...@gmail.com>
> wrote:
>
> > The new consumer API looks good. If I understand it correctly you can use
> > it like simple consumer or high-level consumer. But I have couple
> questions
> > about it's internal implementation
> >
> > First of all does the consumer have any internal fetcher threads like
> > high-level consumer?
> >
> > When you assign multiple TopicPartitions to a consumer, how many TCP
> > connections it establish to the brokers. Is it same as number of leader
> > brokers that host those partitions or just number of TopicPartitions. If
> > there is any leader broker change does it establish new connections/using
> > existing connections to fetch the data? Can it continue consuming? Also
> is
> > the connection kept until the consumer is closed?
> >
> > Thanks!
> >
> > Best,
> > Siyuan
> >
>
>
>
> --
> -- Guozhang
>


Re: Questions about new consumer API

2015-11-17 Thread Jason Gustafson
Hi Siyuan,

Your understanding about assign/subscribe is correct. We think of topic
subscription as enabling automatic assignment as opposed to doing manual
assignment through assign(). We don't currently them to be mixed.

Can you elaborate on your findings with respect to using one thread per
broker? In what sense was it more efficient? Doing the same thing might be
tricky with the new consumer, but I think you could do it using
partitionsFor() to find the current partition leaders and assign() to set
the assignment in each thread.

-Jason

On Tue, Nov 17, 2015 at 10:25 AM, hsy...@gmail.com <hsy...@gmail.com> wrote:

> Thanks Guozhang,
>
> Maybe I should give a few words about what I'm going to achieve with new
> API
>
> Currently, I'm building a new kafka connector for Apache Apex(
> http://apex.incubator.apache.org/) using 0.9.0 API
> Apex support dynamic partition, so in the old version, We manage all the
> consumer partitions in either 1:1 strategy (each consumer process consumes
> only from one kafka partition) or 1:n strategy (each consumer process could
> consume from multiple kafka partitions, using round-robin to distribute)
> And we also have separate thread to monitor topic metadata change(leader
> broker change, new partition added, using internal API like ZkUtil etc)
> and do dynamic partition based on that(for example auto-reconnect to new
> leader broker, create new partition to consume from new kafka partition at
> runtime).  You can see High-level consumer doesn't work(It can only balance
> between existing consumers unless you manually add new one)  I'm thinking
> if the new consumer could be used to save some work we did before.
>
> I'm still confused with assign() and subscribe().  My understanding is if
> you use assign() only, the consumer becomes more like a simple consumer
> except if the leader broker changes it automatically reconnect to the new
> leader broker, is it correct?   If you use subscribe() method only then all
> the partitions will be distributed to running consumer process with same "
> group.id" using "partition.assignment.strategy". Is it true?
>
> So I assume assign() and subscribe()(and group.id
> partition.assignment.strategy settings) can not be used together?
>
> Also in the old API we found one thread per broker is the most efficient
> way to consume data, for example, if one process consumes from p1, p2, p3
> and p1,p2 are sitting on one broker b1, p3 is sitting on another one b2,
> the best thing is create 2 threads each thread use simple consumer API and
> only consume from one broker.  I'm thinking how do I use the new API to do
> this.
>
> Thanks,
> Siyuan
>
> On Mon, Nov 16, 2015 at 4:43 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hi Siyuan,
> >
> > 1) new consumer is single-threaded, it does not maintain any internal
> > threads as the old high-level consumer.
> >
> > 2) each consumer will only maintain one TCP connection with each broker.
> > The only extra socket is the one with its coordinator. That is, if there
> is
> > three brokers S1, S2, S3, and S1 is the coordinator for this consumer, it
> > will maintain 4 sockets in total, 2 for S1 (one for fetching, one for
> > coordinating) and 1 for S2 and S3 (only for fetching).
> >
> > 3) Currently the connection is not closed by consumer, although the
> > underlying network client / selector will close idle ones after some
> > timeout. So in worst case it will only maintain N+1 sockets in total for
> N
> > Kafka brokers at one time.
> >
> > Guozhang
> >
> > On Mon, Nov 16, 2015 at 4:22 PM, hsy...@gmail.com <hsy...@gmail.com>
> > wrote:
> >
> > > The new consumer API looks good. If I understand it correctly you can
> use
> > > it like simple consumer or high-level consumer. But I have couple
> > questions
> > > about it's internal implementation
> > >
> > > First of all does the consumer have any internal fetcher threads like
> > > high-level consumer?
> > >
> > > When you assign multiple TopicPartitions to a consumer, how many TCP
> > > connections it establish to the brokers. Is it same as number of leader
> > > brokers that host those partitions or just number of TopicPartitions.
> If
> > > there is any leader broker change does it establish new
> connections/using
> > > existing connections to fetch the data? Can it continue consuming? Also
> > is
> > > the connection kept until the consumer is closed?
> > >
> > > Thanks!
> > >
> > > Best,
> > > Siyuan
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: Questions about new consumer API

2015-11-17 Thread hsy...@gmail.com
By efficiency, I mean maximize throughput while minimize resources on both
broker sides and consumer sides.

One example is if you have over 200 partitions on 10 brokers and you can
start 5 consumer processes to consume data, if each one is single-thread
and you do round-robin to distribute the load then each one will try to
fetch from over 40 partitions one by one through 10 connections
possibly(overall is 50),  but if it's smart enough to group partitions by
brokers, each process can have 2 separate threads(consuming from 2
different brokers concurrently). That seems a more optimal solution than
another, right?

On Tue, Nov 17, 2015 at 2:54 PM, Jason Gustafson <ja...@confluent.io> wrote:

> Hi Siyuan,
>
> Your understanding about assign/subscribe is correct. We think of topic
> subscription as enabling automatic assignment as opposed to doing manual
> assignment through assign(). We don't currently them to be mixed.
>
> Can you elaborate on your findings with respect to using one thread per
> broker? In what sense was it more efficient? Doing the same thing might be
> tricky with the new consumer, but I think you could do it using
> partitionsFor() to find the current partition leaders and assign() to set
> the assignment in each thread.
>
> -Jason
>
> On Tue, Nov 17, 2015 at 10:25 AM, hsy...@gmail.com <hsy...@gmail.com>
> wrote:
>
> > Thanks Guozhang,
> >
> > Maybe I should give a few words about what I'm going to achieve with new
> > API
> >
> > Currently, I'm building a new kafka connector for Apache Apex(
> > http://apex.incubator.apache.org/) using 0.9.0 API
> > Apex support dynamic partition, so in the old version, We manage all the
> > consumer partitions in either 1:1 strategy (each consumer process
> consumes
> > only from one kafka partition) or 1:n strategy (each consumer process
> could
> > consume from multiple kafka partitions, using round-robin to distribute)
> > And we also have separate thread to monitor topic metadata change(leader
> > broker change, new partition added, using internal API like ZkUtil etc)
> > and do dynamic partition based on that(for example auto-reconnect to new
> > leader broker, create new partition to consume from new kafka partition
> at
> > runtime).  You can see High-level consumer doesn't work(It can only
> balance
> > between existing consumers unless you manually add new one)  I'm thinking
> > if the new consumer could be used to save some work we did before.
> >
> > I'm still confused with assign() and subscribe().  My understanding is if
> > you use assign() only, the consumer becomes more like a simple consumer
> > except if the leader broker changes it automatically reconnect to the new
> > leader broker, is it correct?   If you use subscribe() method only then
> all
> > the partitions will be distributed to running consumer process with same
> "
> > group.id" using "partition.assignment.strategy". Is it true?
> >
> > So I assume assign() and subscribe()(and group.id
> > partition.assignment.strategy settings) can not be used together?
> >
> > Also in the old API we found one thread per broker is the most efficient
> > way to consume data, for example, if one process consumes from p1, p2, p3
> > and p1,p2 are sitting on one broker b1, p3 is sitting on another one b2,
> > the best thing is create 2 threads each thread use simple consumer API
> and
> > only consume from one broker.  I'm thinking how do I use the new API to
> do
> > this.
> >
> > Thanks,
> > Siyuan
> >
> > On Mon, Nov 16, 2015 at 4:43 PM, Guozhang Wang <wangg...@gmail.com>
> wrote:
> >
> > > Hi Siyuan,
> > >
> > > 1) new consumer is single-threaded, it does not maintain any internal
> > > threads as the old high-level consumer.
> > >
> > > 2) each consumer will only maintain one TCP connection with each
> broker.
> > > The only extra socket is the one with its coordinator. That is, if
> there
> > is
> > > three brokers S1, S2, S3, and S1 is the coordinator for this consumer,
> it
> > > will maintain 4 sockets in total, 2 for S1 (one for fetching, one for
> > > coordinating) and 1 for S2 and S3 (only for fetching).
> > >
> > > 3) Currently the connection is not closed by consumer, although the
> > > underlying network client / selector will close idle ones after some
> > > timeout. So in worst case it will only maintain N+1 sockets in total
> for
> > N
> > > Kafka brokers at one time.
> > >
> > > Guozhang
> > >
> > > On Mon, Nov 16, 201

Re: Questions about new consumer API

2015-11-17 Thread Jason Gustafson
;wangg...@gmail.com>
> > wrote:
> > >
> > > > Hi Siyuan,
> > > >
> > > > 1) new consumer is single-threaded, it does not maintain any internal
> > > > threads as the old high-level consumer.
> > > >
> > > > 2) each consumer will only maintain one TCP connection with each
> > broker.
> > > > The only extra socket is the one with its coordinator. That is, if
> > there
> > > is
> > > > three brokers S1, S2, S3, and S1 is the coordinator for this
> consumer,
> > it
> > > > will maintain 4 sockets in total, 2 for S1 (one for fetching, one for
> > > > coordinating) and 1 for S2 and S3 (only for fetching).
> > > >
> > > > 3) Currently the connection is not closed by consumer, although the
> > > > underlying network client / selector will close idle ones after some
> > > > timeout. So in worst case it will only maintain N+1 sockets in total
> > for
> > > N
> > > > Kafka brokers at one time.
> > > >
> > > > Guozhang
> > > >
> > > > On Mon, Nov 16, 2015 at 4:22 PM, hsy...@gmail.com <hsy...@gmail.com>
> > > > wrote:
> > > >
> > > > > The new consumer API looks good. If I understand it correctly you
> can
> > > use
> > > > > it like simple consumer or high-level consumer. But I have couple
> > > > questions
> > > > > about it's internal implementation
> > > > >
> > > > > First of all does the consumer have any internal fetcher threads
> like
> > > > > high-level consumer?
> > > > >
> > > > > When you assign multiple TopicPartitions to a consumer, how many
> TCP
> > > > > connections it establish to the brokers. Is it same as number of
> > leader
> > > > > brokers that host those partitions or just number of
> TopicPartitions.
> > > If
> > > > > there is any leader broker change does it establish new
> > > connections/using
> > > > > existing connections to fetch the data? Can it continue consuming?
> > Also
> > > > is
> > > > > the connection kept until the consumer is closed?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Best,
> > > > > Siyuan
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>


Questions about new consumer API

2015-11-16 Thread hsy...@gmail.com
The new consumer API looks good. If I understand it correctly you can use
it like simple consumer or high-level consumer. But I have couple questions
about it's internal implementation

First of all does the consumer have any internal fetcher threads like
high-level consumer?

When you assign multiple TopicPartitions to a consumer, how many TCP
connections it establish to the brokers. Is it same as number of leader
brokers that host those partitions or just number of TopicPartitions. If
there is any leader broker change does it establish new connections/using
existing connections to fetch the data? Can it continue consuming? Also is
the connection kept until the consumer is closed?

Thanks!

Best,
Siyuan


Re: Questions about new consumer API

2015-11-16 Thread Guozhang Wang
Hi Siyuan,

1) new consumer is single-threaded, it does not maintain any internal
threads as the old high-level consumer.

2) each consumer will only maintain one TCP connection with each broker.
The only extra socket is the one with its coordinator. That is, if there is
three brokers S1, S2, S3, and S1 is the coordinator for this consumer, it
will maintain 4 sockets in total, 2 for S1 (one for fetching, one for
coordinating) and 1 for S2 and S3 (only for fetching).

3) Currently the connection is not closed by consumer, although the
underlying network client / selector will close idle ones after some
timeout. So in worst case it will only maintain N+1 sockets in total for N
Kafka brokers at one time.

Guozhang

On Mon, Nov 16, 2015 at 4:22 PM, hsy...@gmail.com <hsy...@gmail.com> wrote:

> The new consumer API looks good. If I understand it correctly you can use
> it like simple consumer or high-level consumer. But I have couple questions
> about it's internal implementation
>
> First of all does the consumer have any internal fetcher threads like
> high-level consumer?
>
> When you assign multiple TopicPartitions to a consumer, how many TCP
> connections it establish to the brokers. Is it same as number of leader
> brokers that host those partitions or just number of TopicPartitions. If
> there is any leader broker change does it establish new connections/using
> existing connections to fetch the data? Can it continue consuming? Also is
> the connection kept until the consumer is closed?
>
> Thanks!
>
> Best,
> Siyuan
>



-- 
-- Guozhang


Re: Question on 0.9 new consumer API

2015-11-12 Thread Grant Henke
The new consumer (0.9.0) will not be compatible with older brokers (0.8.2).
In general you should upgrade brokers before upgrading clients. The old
clients (0.8.2) will work on the new brokers (0.9.0).

Thanks,
Grant

On Thu, Nov 12, 2015 at 7:52 AM, Han JU <ju.han.fe...@gmail.com> wrote:

> Hello,
>
> Just want to know if the new consumer API coming with 0.9 will be
> compatible with 0.8 broker servers? We're looking at the new consumer
> because the new rebalancing listener is very interesting for one of our use
> case.
>
> Another question is that if we have to upgrade our brokers to 0.9, will
> they accept producers in 0.8.2?
>
> Thanks!
>
> --
> *JU Han*
>
> Software Engineer @ Teads.tv
>
> +33 061960
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: Question on 0.9 new consumer API

2015-11-12 Thread Han JU
Ok thanks for your confirmation!

2015-11-12 15:19 GMT+01:00 Grant Henke <ghe...@cloudera.com>:

> The new consumer (0.9.0) will not be compatible with older brokers (0.8.2).
> In general you should upgrade brokers before upgrading clients. The old
> clients (0.8.2) will work on the new brokers (0.9.0).
>
> Thanks,
> Grant
>
> On Thu, Nov 12, 2015 at 7:52 AM, Han JU <ju.han.fe...@gmail.com> wrote:
>
> > Hello,
> >
> > Just want to know if the new consumer API coming with 0.9 will be
> > compatible with 0.8 broker servers? We're looking at the new consumer
> > because the new rebalancing listener is very interesting for one of our
> use
> > case.
> >
> > Another question is that if we have to upgrade our brokers to 0.9, will
> > they accept producers in 0.8.2?
> >
> > Thanks!
> >
> > --
> > *JU Han*
> >
> > Software Engineer @ Teads.tv
> >
> > +33 061960
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>



-- 
*JU Han*

Software Engineer @ Teads.tv

+33 061960


new consumer API & release 0.8.3

2015-09-04 Thread Shashank Singh
Hi

I am eager to get to use the enhanced Consumer API which provides better
control in terms of offset management etc. As I believe from reading
through forums it is coming as part of 0.8.3 release. However there is no
tentative date for the same.

Can you please give any hint on that. Also which is the best forum to ask
questions on how these new APIs are shaping up and the details about the
same..

-- 

*Warm Regards,*

*Shashank  *

*Mobile: +91 9910478553 *

*Linkedin: in.linkedin.com/pub/shashank-singh/13/763/906/
*


RE: new consumer api?

2015-08-04 Thread Simon Cooper
Reading on the consumer docs, there's no mention of a relatively simple 
consumer that doesn't need groups, coordinators, commits, anything like that - 
just read and poll from specified offsets of specific topic partitions - but 
automatically deals with leadership changes and connection losses (so one level 
up from SimpleConsumer).

Will the new API be able to be used in this relatively simple way?
SimonC

-Original Message-
From: Jun Rao [mailto:j...@confluent.io] 
Sent: 03 August 2015 18:19
To: users@kafka.apache.org
Subject: Re: new consumer api?

Jalpesh,

We are still iterating on the new consumer a bit and are waiting for some of 
the security jiras to be committed. So now, we are shooting for releasing 0.8.3 
in Oct (just updated 
https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan).

Thanks,

Jun

On Mon, Aug 3, 2015 at 8:41 AM, Jalpesh Patadia  
jalpesh.pata...@clickbank.com wrote:

 Hello guys,

 A while ago i read that the new consumer api was going to be released 
 sometime in July as part of the 0.8.3/0.9 release.
 https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan


 Do we have an update when we think that can happen?


 Thanks,

 Jalpesh


 -- PRIVILEGED AND CONFIDENTIAL This transmission may contain 
 privileged, proprietary or confidential information. If you are not 
 the intended recipient, you are instructed not to review this 
 transmission. If you are not the intended recipient, please notify the 
 sender that you received this message and delete this transmission from your 
 system.



Re: New Consumer API and Range Consumption with Fail-over

2015-08-04 Thread Bhavesh Mistry
Hi Jason and Kafka Dev Team,



First of all thanks for responding and I think you got expected behavior
correctly.



The use-case is offset range consumption.  We store each minute highest
offset for each topic per partition.  So if we need to reload or re-consume
data from yesterday per say 8AM to noon, we would have offset start mapping
at 8AM and end offset mapping at noon in Time Series Database.



I was trying to load this use case with New Consumer API.   Do you or Kafka
Dev team agree with request to either have API that takes in topic and its
start/end offset for High Level Consumer group  (With older consumer API we
used Simple consumer before without fail-over).  Also, for each
range-consumption, there will be different group id  and group id will not
be reused.  The main purpose is to reload or process past data again (due
to production bugs or downtime etc occasionally and let main consumer-group
continue to consume latest records).


void subscribe(TopicPartition[] startOffsetPartitions, TopicPartition[]
endOffsetPartitions)



or something similar which will allow following:



1)   When consumer group already exists (meaning have consumed data and
committed offset to storage system either Kafka or ZK) ignore start offset
positions and use committed offset.  If not committed use start Offset
Partition.

2)   When partition consumption has reached end Offset for given partition,
pause is fine or this assigned thread become fail over or wait for
reassignment.

3)   When all are Consumer Group is done consuming all partitions offset
ranges (start to end), gracefully shutdown entire consumer group.

4)   While consuming records, if one of node or consuming thread goes down
automatic fail-over to others (Similar to High Level Consumer for OLD
Consumer API.   I am not sure if there exists High level and/or Simple
Consumer concept for New API  )



I hope above explanation clarifies use-case and intended behavior.  Thanks
for clarifications, and you are correct we need pause(TopicPartition tp),
resume(TopicPartition tp), and/or API to set to end offset for each
partition.



Please do let us know your preference to support above simple use-case.


Thanks,


Bhavesh

On Thu, Jul 30, 2015 at 1:23 PM, Jason Gustafson ja...@confluent.io wrote:

 Hi Bhavesh,

 I'm not totally sure I understand the expected behavior, but I think this
 can work. Instead of seeking to the start of the range before the poll
 loop, you should probably provide a ConsumerRebalanceCallback to get
 notifications when group assignment has changed (e.g. when one of your
 nodes dies). When a new partition is assigned, the callback will be invoked
 by the consumer and you can use it to check if there's a committed position
 in the range or if you need to seek to the beginning of the range. For
 example:

 void onPartitionsAssigned(consumer, partitions) {
   for (partition : partitions) {
  try {
offset = consumer.committed(partition)
consumer.seek(partition, offset)
  } catch (NoOffsetForPartition) {
consumer.seek(partition, rangeStart)
  }
   }
 }

 If a failure occurs, then the partitions will be rebalanced across
 whichever consumers are still active. The case of the entire cluster being
 rebooted is not really different. When the consumers come back, they check
 the committed position and resume where they left off. Does that make
 sense?

 After you are finished consuming a partition's range, you can use
 KafkaConsumer.pause(partition) to prevent further fetches from being
 initiated while still maintaining the current assignment. The patch to add
 pause() is not in trunk yet, but it probably will be before too long.

 One potential problem is that you wouldn't be able to reuse the same group
 to consume a different range because of the way it depends on the committed
 offsets. Kafka's commit API actually allows some additional metadata to go
 along with a committed offset and that could potentially be used to tie the
 commit to the range, but it's not yet exposed in KafkaConsumer. I assume it
 will be eventually, but I'm not sure whether that will be part of the
 initial release.


 Hope that helps!

 Jason

 On Thu, Jul 30, 2015 at 7:54 AM, Bhavesh Mistry 
 mistry.p.bhav...@gmail.com
 wrote:

  Hello Kafka Dev Team,
 
 
  With new Consumer API redesign  (
 
 
 https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java
  ),  is there a capability to consume given the topic and partition
 start/
  end position.  How would I achieve following use case of range
 consumption
  with fail-over.
 
 
  Use Case:
  Ability to reload data given topic and its partition offset start/end
 with
  High Level Consumer with fail over.   Basically, High Level Range
  consumption and consumer group dies while main consumer group.
 
 
  Suppose you have a topic called “test-topic” and its partition begin and
  end offset.
 
  {
 
  topic:  test-topic

Re: new consumer api?

2015-08-04 Thread Jason Gustafson
Hey Simon,

The new consumer has the ability to forego group management and assign
partitions directly. Once assigned, you can seek to any offset you want.

-Jason

On Tue, Aug 4, 2015 at 5:08 AM, Simon Cooper 
simon.coo...@featurespace.co.uk wrote:

 Reading on the consumer docs, there's no mention of a relatively simple
 consumer that doesn't need groups, coordinators, commits, anything like
 that - just read and poll from specified offsets of specific topic
 partitions - but automatically deals with leadership changes and connection
 losses (so one level up from SimpleConsumer).

 Will the new API be able to be used in this relatively simple way?
 SimonC

 -Original Message-
 From: Jun Rao [mailto:j...@confluent.io]
 Sent: 03 August 2015 18:19
 To: users@kafka.apache.org
 Subject: Re: new consumer api?

 Jalpesh,

 We are still iterating on the new consumer a bit and are waiting for some
 of the security jiras to be committed. So now, we are shooting for
 releasing 0.8.3 in Oct (just updated
 https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan).

 Thanks,

 Jun

 On Mon, Aug 3, 2015 at 8:41 AM, Jalpesh Patadia 
 jalpesh.pata...@clickbank.com wrote:

  Hello guys,
 
  A while ago i read that the new consumer api was going to be released
  sometime in July as part of the 0.8.3/0.9 release.
  https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
 
 
  Do we have an update when we think that can happen?
 
 
  Thanks,
 
  Jalpesh
 
 
  -- PRIVILEGED AND CONFIDENTIAL This transmission may contain
  privileged, proprietary or confidential information. If you are not
  the intended recipient, you are instructed not to review this
  transmission. If you are not the intended recipient, please notify the
  sender that you received this message and delete this transmission from
 your system.
 



Re: new consumer api?

2015-08-03 Thread Jun Rao
Jalpesh,

We are still iterating on the new consumer a bit and are waiting for some
of the security jiras to be committed. So now, we are shooting for
releasing 0.8.3 in Oct (just updated
https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan).

Thanks,

Jun

On Mon, Aug 3, 2015 at 8:41 AM, Jalpesh Patadia 
jalpesh.pata...@clickbank.com wrote:

 Hello guys,

 A while ago i read that the new consumer api was going to be released
 sometime in July as part of the 0.8.3/0.9 release.
 https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan


 Do we have an update when we think that can happen?


 Thanks,

 Jalpesh


 -- PRIVILEGED AND CONFIDENTIAL This transmission may contain privileged,
 proprietary or confidential information. If you are not the intended
 recipient, you are instructed not to review this transmission. If you are
 not the intended recipient, please notify the sender that you received this
 message and delete this transmission from your system.



New Consumer API and Range Consumption with Fail-over

2015-07-30 Thread Bhavesh Mistry
Hello Kafka Dev Team,


With new Consumer API redesign  (
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java
),  is there a capability to consume given the topic and partition  start/
end position.  How would I achieve following use case of range consumption
with fail-over.


Use Case:
Ability to reload data given topic and its partition offset start/end with
High Level Consumer with fail over.   Basically, High Level Range
consumption and consumer group dies while main consumer group.


Suppose you have a topic called “test-topic” and its partition begin and
end offset.

{

topic:  test-topic,

[   {  partition id : 1 , offset start:   100,  offset end:
500,000 },


{  partition id : 2 ,  offset start:   200,000, offset end:
500,000

….. for n partitions

]

}

Each you create consumer group: “Range-Consumer “ and use seek method and
for each partition.   Your feedback is greatly appreciated.


In each JVM,


For each consumption tread:


Consumer c = KafkaConsumer( { group.id=”Range-consumer}…)

MapInteger, Integer parttionTOEndOfsetMapping ….

for(TopicPartition tp : topicPartitionlist){

seek(TopicPartition(Parition 1), long offset)

}



while(true){

ConsumerRecords records = consumer.poll(1);

// for each record check the offset

record = record.iterator().next();

if(parttionTOEndOfsetMapping(record.getPartition()) =
record.getoffset) {
  // consume  record

//commit  offset

  consumer.commit(CommitType.SYNC);

}else {

// Should I unsubscribe it now  for this partition ?

consumer.unscribe(record.getPartition)

}



}




Please let me know if the above approach is valid:

1) how will fail-over work.

2) how Rebooting entire consumer group impacts offset seek ? Since offset
are stored by Kafka itsself.

Thanks ,

Bhavesh


Re: New Consumer API and Range Consumption with Fail-over

2015-07-30 Thread Jason Gustafson
Hi Bhavesh,

I'm not totally sure I understand the expected behavior, but I think this
can work. Instead of seeking to the start of the range before the poll
loop, you should probably provide a ConsumerRebalanceCallback to get
notifications when group assignment has changed (e.g. when one of your
nodes dies). When a new partition is assigned, the callback will be invoked
by the consumer and you can use it to check if there's a committed position
in the range or if you need to seek to the beginning of the range. For
example:

void onPartitionsAssigned(consumer, partitions) {
  for (partition : partitions) {
 try {
   offset = consumer.committed(partition)
   consumer.seek(partition, offset)
 } catch (NoOffsetForPartition) {
   consumer.seek(partition, rangeStart)
 }
  }
}

If a failure occurs, then the partitions will be rebalanced across
whichever consumers are still active. The case of the entire cluster being
rebooted is not really different. When the consumers come back, they check
the committed position and resume where they left off. Does that make sense?

After you are finished consuming a partition's range, you can use
KafkaConsumer.pause(partition) to prevent further fetches from being
initiated while still maintaining the current assignment. The patch to add
pause() is not in trunk yet, but it probably will be before too long.

One potential problem is that you wouldn't be able to reuse the same group
to consume a different range because of the way it depends on the committed
offsets. Kafka's commit API actually allows some additional metadata to go
along with a committed offset and that could potentially be used to tie the
commit to the range, but it's not yet exposed in KafkaConsumer. I assume it
will be eventually, but I'm not sure whether that will be part of the
initial release.


Hope that helps!

Jason

On Thu, Jul 30, 2015 at 7:54 AM, Bhavesh Mistry mistry.p.bhav...@gmail.com
wrote:

 Hello Kafka Dev Team,


 With new Consumer API redesign  (

 https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java
 ),  is there a capability to consume given the topic and partition  start/
 end position.  How would I achieve following use case of range consumption
 with fail-over.


 Use Case:
 Ability to reload data given topic and its partition offset start/end with
 High Level Consumer with fail over.   Basically, High Level Range
 consumption and consumer group dies while main consumer group.


 Suppose you have a topic called “test-topic” and its partition begin and
 end offset.

 {

 topic:  test-topic,

 [   {  partition id : 1 , offset start:   100,  offset end:
 500,000 },


 {  partition id : 2 ,  offset start:   200,000, offset end:
 500,000

 ….. for n partitions

 ]

 }

 Each you create consumer group: “Range-Consumer “ and use seek method and
 for each partition.   Your feedback is greatly appreciated.


 In each JVM,


 For each consumption tread:


 Consumer c = KafkaConsumer( { group.id=”Range-consumer}…)

 MapInteger, Integer parttionTOEndOfsetMapping ….

 for(TopicPartition tp : topicPartitionlist){

 seek(TopicPartition(Parition 1), long offset)

 }



 while(true){

 ConsumerRecords records = consumer.poll(1);

 // for each record check the offset

 record = record.iterator().next();

 if(parttionTOEndOfsetMapping(record.getPartition()) =
 record.getoffset) {
   // consume  record

 //commit  offset

   consumer.commit(CommitType.SYNC);

 }else {

 // Should I unsubscribe it now  for this partition
 ?

 consumer.unscribe(record.getPartition)

 }



 }




 Please let me know if the above approach is valid:

 1) how will fail-over work.

 2) how Rebooting entire consumer group impacts offset seek ? Since offset
 are stored by Kafka itsself.

 Thanks ,

 Bhavesh



New consumer API used in mirror maker

2015-07-12 Thread tao xiao
Hi team,

The trunk code of mirror maker now uses the old consumer API, Is there any
plan to use new Java consumer api in mirror maker?


Re: New consumer API used in mirror maker

2015-07-12 Thread Jiangjie Qin
Yes, we are going to use new consumer after it is ready.


Jiangjie (Becket) Qin

On 7/12/15, 8:21 PM, tao xiao xiaotao...@gmail.com wrote:

Hi team,

The trunk code of mirror maker now uses the old consumer API, Is there any
plan to use new Java consumer api in mirror maker?



Is the new consumer API ready?

2015-07-10 Thread Simon Cooper
I'm updating the kafka APIs we use to the new standalone ones, but it look like 
the new consumer isn't ready yet (the code has got lots of placeholders etc), 
and there's only the producer in the Javadoc at 
http://kafka.apache.org/082/javadoc/index.html. Is there an ETA on when the new 
consumer will be ready?

SimonC
This message, and any files/attachments transmitted together with it, is 
intended for the use only of the person (or persons) to whom it is addressed. 
It may contain information which is confidential and/or protected by legal 
privilege. Accordingly, any dissemination, distribution, copying or use of this 
message, or any part of it or anything sent together with it, other than by 
intended recipients, may constitute a breach of civil or criminal law and is 
hereby prohibited. Unless otherwise stated, any views expressed in this message 
are those of the person sending it and not the sender's employer. No 
responsibility, legal or otherwise, of whatever nature, is accepted as to the 
accuracy of the contents of this message or for the completeness of the message 
as received. Anyone who is not the intended recipient of this message is 
advised to make no use of it and is requested to contact Featurespace Limited 
as soon as possible. Any recipient of this message who has knowledge or 
suspects that it may have been the subject of unauthorised interception or 
alteration is also requested to contact Featurespace Limited.


Re: Is the new consumer API ready?

2015-07-10 Thread Guozhang Wang
Hi Simon,

The API will be available in the next release, and which is planed in a
month.

At the meantime you could start trying it out from trunk if you want.

Guozhang

On Fri, Jul 10, 2015 at 1:24 AM, Simon Cooper 
simon.coo...@featurespace.co.uk wrote:

 I'm updating the kafka APIs we use to the new standalone ones, but it look
 like the new consumer isn't ready yet (the code has got lots of
 placeholders etc), and there's only the producer in the Javadoc at
 http://kafka.apache.org/082/javadoc/index.html. Is there an ETA on when
 the new consumer will be ready?

 SimonC
 This message, and any files/attachments transmitted together with it, is
 intended for the use only of the person (or persons) to whom it is
 addressed. It may contain information which is confidential and/or
 protected by legal privilege. Accordingly, any dissemination, distribution,
 copying or use of this message, or any part of it or anything sent together
 with it, other than by intended recipients, may constitute a breach of
 civil or criminal law and is hereby prohibited. Unless otherwise stated,
 any views expressed in this message are those of the person sending it and
 not the sender's employer. No responsibility, legal or otherwise, of
 whatever nature, is accepted as to the accuracy of the contents of this
 message or for the completeness of the message as received. Anyone who is
 not the intended recipient of this message is advised to make no use of it
 and is requested to contact Featurespace Limited as soon as possible. Any
 recipient of this message who has knowledge or suspects that it may have
 been the subject of unauthorised interception or alteration is also
 requested to contact Featurespace Limited.




-- 
-- Guozhang


Questions about new consumer API

2014-12-02 Thread hsy...@gmail.com
Hi guys,

I'm interested in the new Consumer API.
http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/

I have couple of question.
1. In this doc it says kafka consumer will automatically do load balance.
Is it based on throughtput or same as what we have now balance the
cardinality among all consumers in same ConsumerGroup? In a real case
different partitions could have different peak time.
2. In the API, threre is subscribe(partition...) method saying not using
group management, does it mean the group.id property will be discarded and
developer has full control of distributing partitions to consumers?
3. Is new API compatible with old broker?
4. Will simple consumer api and high-level consumer api still be supported?

Thanks!

Best,
Siyuan


Re: Questions about new consumer API

2014-12-02 Thread Neha Narkhede
1. In this doc it says kafka consumer will automatically do load balance.
Is it based on throughtput or same as what we have now balance the
cardinality among all consumers in same ConsumerGroup? In a real case
different partitions could have different peak time.

Load balancing is still based on # of partitions for the subscribed topics
and
ensuring that each partition has exactly one consumer as the owner.

2. In the API, threre is subscribe(partition...) method saying not using
group management, does it mean the group.id property will be discarded and
developer has full control of distributing partitions to consumers?

group.id is also required for offset management, if the user chooses to use
Kafka based offset management. The user will have full control over
distribution
of partitions to consumers.

3. Is new API compatible with old broker?

Yes, it will.

4. Will simple consumer api and high-level consumer api still be supported?

Over time, we will phase out the current high-level and simple consumer
since the
0.9 API supports both.

Thanks,
Neha

On Tue, Dec 2, 2014 at 12:07 PM, hsy...@gmail.com hsy...@gmail.com wrote:

 Hi guys,

 I'm interested in the new Consumer API.
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/

 I have couple of question.
 1. In this doc it says kafka consumer will automatically do load balance.
 Is it based on throughtput or same as what we have now balance the
 cardinality among all consumers in same ConsumerGroup? In a real case
 different partitions could have different peak time.
 2. In the API, threre is subscribe(partition...) method saying not using
 group management, does it mean the group.id property will be discarded and
 developer has full control of distributing partitions to consumers?
 3. Is new API compatible with old broker?
 4. Will simple consumer api and high-level consumer api still be supported?

 Thanks!

 Best,
 Siyuan



Re: Questions about new consumer API

2014-12-02 Thread hsy...@gmail.com
Thanks Neha, another question, so if offsets are stored under group.id,
dose it mean in one group, there should be at most one subscriber for each
topic partition?

Best,
Siyuan

On Tue, Dec 2, 2014 at 12:55 PM, Neha Narkhede neha.narkh...@gmail.com
wrote:

 1. In this doc it says kafka consumer will automatically do load balance.
 Is it based on throughtput or same as what we have now balance the
 cardinality among all consumers in same ConsumerGroup? In a real case
 different partitions could have different peak time.

 Load balancing is still based on # of partitions for the subscribed topics
 and
 ensuring that each partition has exactly one consumer as the owner.

 2. In the API, threre is subscribe(partition...) method saying not using
 group management, does it mean the group.id property will be discarded and
 developer has full control of distributing partitions to consumers?

 group.id is also required for offset management, if the user chooses to
 use
 Kafka based offset management. The user will have full control over
 distribution
 of partitions to consumers.

 3. Is new API compatible with old broker?

 Yes, it will.

 4. Will simple consumer api and high-level consumer api still be supported?

 Over time, we will phase out the current high-level and simple consumer
 since the
 0.9 API supports both.

 Thanks,
 Neha

 On Tue, Dec 2, 2014 at 12:07 PM, hsy...@gmail.com hsy...@gmail.com
 wrote:

  Hi guys,
 
  I'm interested in the new Consumer API.
  http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/
 
  I have couple of question.
  1. In this doc it says kafka consumer will automatically do load balance.
  Is it based on throughtput or same as what we have now balance the
  cardinality among all consumers in same ConsumerGroup? In a real case
  different partitions could have different peak time.
  2. In the API, threre is subscribe(partition...) method saying not using
  group management, does it mean the group.id property will be discarded
 and
  developer has full control of distributing partitions to consumers?
  3. Is new API compatible with old broker?
  4. Will simple consumer api and high-level consumer api still be
 supported?
 
  Thanks!
 
  Best,
  Siyuan
 



Re: Questions about new consumer API

2014-12-02 Thread Neha Narkhede
The offsets are keyed on group, topic, partition so if you have more than
one owner per partition, they will rewrite each other's offsets and lead to
incorrect state.

On Tue, Dec 2, 2014 at 2:32 PM, hsy...@gmail.com hsy...@gmail.com wrote:

 Thanks Neha, another question, so if offsets are stored under group.id,
 dose it mean in one group, there should be at most one subscriber for each
 topic partition?

 Best,
 Siyuan

 On Tue, Dec 2, 2014 at 12:55 PM, Neha Narkhede neha.narkh...@gmail.com
 wrote:

  1. In this doc it says kafka consumer will automatically do load balance.
  Is it based on throughtput or same as what we have now balance the
  cardinality among all consumers in same ConsumerGroup? In a real case
  different partitions could have different peak time.
 
  Load balancing is still based on # of partitions for the subscribed
 topics
  and
  ensuring that each partition has exactly one consumer as the owner.
 
  2. In the API, threre is subscribe(partition...) method saying not using
  group management, does it mean the group.id property will be discarded
 and
  developer has full control of distributing partitions to consumers?
 
  group.id is also required for offset management, if the user chooses to
  use
  Kafka based offset management. The user will have full control over
  distribution
  of partitions to consumers.
 
  3. Is new API compatible with old broker?
 
  Yes, it will.
 
  4. Will simple consumer api and high-level consumer api still be
 supported?
 
  Over time, we will phase out the current high-level and simple consumer
  since the
  0.9 API supports both.
 
  Thanks,
  Neha
 
  On Tue, Dec 2, 2014 at 12:07 PM, hsy...@gmail.com hsy...@gmail.com
  wrote:
 
   Hi guys,
  
   I'm interested in the new Consumer API.
   http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/
  
   I have couple of question.
   1. In this doc it says kafka consumer will automatically do load
 balance.
   Is it based on throughtput or same as what we have now balance the
   cardinality among all consumers in same ConsumerGroup? In a real case
   different partitions could have different peak time.
   2. In the API, threre is subscribe(partition...) method saying not
 using
   group management, does it mean the group.id property will be discarded
  and
   developer has full control of distributing partitions to consumers?
   3. Is new API compatible with old broker?
   4. Will simple consumer api and high-level consumer api still be
  supported?
  
   Thanks!
  
   Best,
   Siyuan
  
 



Re: quick question about new consumer api

2014-07-07 Thread Guozhang Wang
Hi Jason,

In the new design the consumption is still at the per-partition
granularity. The main rationale of doing this is ordering: Within a
partition we want to preserve the ordering such that message B produced
after message A will also be consumed and processed after message A. And
producers can use keys to make sure messages with the same ordering group
will be in the same partition. To do this we have to make one partition
only being consumed by a single client at a time. On the other hand, when
one wants to add the number of consumers beyond the number of partitions,
he can always use the topic tool to dynamically add more partitions to the
topic.

Do you have a specific scenario in mind that would require single-partition
topics?

Guozhang



On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg j...@squareup.com wrote:

 I've been looking at the new consumer api outlined here:

 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design

 One issue in the current high-level consumer, is that it does not do a good
 job of distributing a set of topics between multiple consumers, unless each
 topic has multiple partitions.  This has always seemed strange to me, since
 at the end of the day, even for single partition topics, the basic unit of
 consumption is still at the partition level (so you'd expect rebalancing to
 try to evenly distribute partitions (regardless of the topic)).

 It's not clearly spelled out in the new consumer api wiki, so I'll just
 ask, will this issue be addressed in the new api?  I think I've asked this
 before, but I wanted to go check again, and am not seeing this explicitly
 addressed in the design.

 Thanks

 Jason




-- 
-- Guozhang


Re: quick question about new consumer api

2014-07-07 Thread Jason Rosenberg
Guozhang,

I'm not suggesting we parallelize within a partition

The problem with the current high-level consumer is, if you use a regex to
select multiple topics, and then have multiple consumers in the same group,
usually the first consumer will 'own' all the topics, and no amount of
sub-sequent rebalancing will allow other consumers in the group to own some
of the topics.  Re-balancing does allow other consumers to own multiple
partitions, but if a topic has only 1 partition, only the first consumer to
initialize will get all the work.

So, I'm wondering if the new api will be better about re-balancing the work
at the partition level, and not the topic level, as such.

Jason


On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang wangg...@gmail.com wrote:

 Hi Jason,

 In the new design the consumption is still at the per-partition
 granularity. The main rationale of doing this is ordering: Within a
 partition we want to preserve the ordering such that message B produced
 after message A will also be consumed and processed after message A. And
 producers can use keys to make sure messages with the same ordering group
 will be in the same partition. To do this we have to make one partition
 only being consumed by a single client at a time. On the other hand, when
 one wants to add the number of consumers beyond the number of partitions,
 he can always use the topic tool to dynamically add more partitions to the
 topic.

 Do you have a specific scenario in mind that would require single-partition
 topics?

 Guozhang



 On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg j...@squareup.com wrote:

  I've been looking at the new consumer api outlined here:
 
 
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
 
  One issue in the current high-level consumer, is that it does not do a
 good
  job of distributing a set of topics between multiple consumers, unless
 each
  topic has multiple partitions.  This has always seemed strange to me,
 since
  at the end of the day, even for single partition topics, the basic unit
 of
  consumption is still at the partition level (so you'd expect rebalancing
 to
  try to evenly distribute partitions (regardless of the topic)).
 
  It's not clearly spelled out in the new consumer api wiki, so I'll just
  ask, will this issue be addressed in the new api?  I think I've asked
 this
  before, but I wanted to go check again, and am not seeing this explicitly
  addressed in the design.
 
  Thanks
 
  Jason
 



 --
 -- Guozhang



Re: quick question about new consumer api

2014-07-07 Thread Guozhang Wang
I see your point now. The old consumer does have a hard-coded
round-robin-per-topic logic which have this issue. In the new consumer,
we will make the assignment logic customizable so that people can specify
different rebalance algorithms they like.

Also I will soon send out a new consumer design summary email for more
comments. Feel free to give us more thoughts you have about the new
consumer design.

Guozhang


On Mon, Jul 7, 2014 at 8:44 AM, Jason Rosenberg j...@squareup.com wrote:

 Guozhang,

 I'm not suggesting we parallelize within a partition

 The problem with the current high-level consumer is, if you use a regex to
 select multiple topics, and then have multiple consumers in the same group,
 usually the first consumer will 'own' all the topics, and no amount of
 sub-sequent rebalancing will allow other consumers in the group to own some
 of the topics.  Re-balancing does allow other consumers to own multiple
 partitions, but if a topic has only 1 partition, only the first consumer to
 initialize will get all the work.

 So, I'm wondering if the new api will be better about re-balancing the work
 at the partition level, and not the topic level, as such.

 Jason


 On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang wangg...@gmail.com wrote:

  Hi Jason,
 
  In the new design the consumption is still at the per-partition
  granularity. The main rationale of doing this is ordering: Within a
  partition we want to preserve the ordering such that message B produced
  after message A will also be consumed and processed after message A. And
  producers can use keys to make sure messages with the same ordering group
  will be in the same partition. To do this we have to make one partition
  only being consumed by a single client at a time. On the other hand, when
  one wants to add the number of consumers beyond the number of partitions,
  he can always use the topic tool to dynamically add more partitions to
 the
  topic.
 
  Do you have a specific scenario in mind that would require
 single-partition
  topics?
 
  Guozhang
 
 
 
  On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg j...@squareup.com
 wrote:
 
   I've been looking at the new consumer api outlined here:
  
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
  
   One issue in the current high-level consumer, is that it does not do a
  good
   job of distributing a set of topics between multiple consumers, unless
  each
   topic has multiple partitions.  This has always seemed strange to me,
  since
   at the end of the day, even for single partition topics, the basic unit
  of
   consumption is still at the partition level (so you'd expect
 rebalancing
  to
   try to evenly distribute partitions (regardless of the topic)).
  
   It's not clearly spelled out in the new consumer api wiki, so I'll just
   ask, will this issue be addressed in the new api?  I think I've asked
  this
   before, but I wanted to go check again, and am not seeing this
 explicitly
   addressed in the design.
  
   Thanks
  
   Jason
  
 
 
 
  --
  -- Guozhang
 




-- 
-- Guozhang


Re: quick question about new consumer api

2014-07-07 Thread Jason Rosenberg
Great, that's reassuring!

What's the time frame for having a more or less stable version to try out?

Jason


On Mon, Jul 7, 2014 at 12:59 PM, Guozhang Wang wangg...@gmail.com wrote:

 I see your point now. The old consumer does have a hard-coded
 round-robin-per-topic logic which have this issue. In the new consumer,
 we will make the assignment logic customizable so that people can specify
 different rebalance algorithms they like.

 Also I will soon send out a new consumer design summary email for more
 comments. Feel free to give us more thoughts you have about the new
 consumer design.

 Guozhang


 On Mon, Jul 7, 2014 at 8:44 AM, Jason Rosenberg j...@squareup.com wrote:

  Guozhang,
 
  I'm not suggesting we parallelize within a partition
 
  The problem with the current high-level consumer is, if you use a regex
 to
  select multiple topics, and then have multiple consumers in the same
 group,
  usually the first consumer will 'own' all the topics, and no amount of
  sub-sequent rebalancing will allow other consumers in the group to own
 some
  of the topics.  Re-balancing does allow other consumers to own multiple
  partitions, but if a topic has only 1 partition, only the first consumer
 to
  initialize will get all the work.
 
  So, I'm wondering if the new api will be better about re-balancing the
 work
  at the partition level, and not the topic level, as such.
 
  Jason
 
 
  On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang wangg...@gmail.com
 wrote:
 
   Hi Jason,
  
   In the new design the consumption is still at the per-partition
   granularity. The main rationale of doing this is ordering: Within a
   partition we want to preserve the ordering such that message B produced
   after message A will also be consumed and processed after message A.
 And
   producers can use keys to make sure messages with the same ordering
 group
   will be in the same partition. To do this we have to make one partition
   only being consumed by a single client at a time. On the other hand,
 when
   one wants to add the number of consumers beyond the number of
 partitions,
   he can always use the topic tool to dynamically add more partitions to
  the
   topic.
  
   Do you have a specific scenario in mind that would require
  single-partition
   topics?
  
   Guozhang
  
  
  
   On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg j...@squareup.com
  wrote:
  
I've been looking at the new consumer api outlined here:
   
   
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
   
One issue in the current high-level consumer, is that it does not do
 a
   good
job of distributing a set of topics between multiple consumers,
 unless
   each
topic has multiple partitions.  This has always seemed strange to me,
   since
at the end of the day, even for single partition topics, the basic
 unit
   of
consumption is still at the partition level (so you'd expect
  rebalancing
   to
try to evenly distribute partitions (regardless of the topic)).
   
It's not clearly spelled out in the new consumer api wiki, so I'll
 just
ask, will this issue be addressed in the new api?  I think I've asked
   this
before, but I wanted to go check again, and am not seeing this
  explicitly
addressed in the design.
   
Thanks
   
Jason
   
  
  
  
   --
   -- Guozhang
  
 



 --
 -- Guozhang



Re: quick question about new consumer api

2014-07-07 Thread Guozhang Wang
We plan to have a working prototype ready end of September.

Guozhang


On Mon, Jul 7, 2014 at 11:05 AM, Jason Rosenberg j...@squareup.com wrote:

 Great, that's reassuring!

 What's the time frame for having a more or less stable version to try out?

 Jason


 On Mon, Jul 7, 2014 at 12:59 PM, Guozhang Wang wangg...@gmail.com wrote:

  I see your point now. The old consumer does have a hard-coded
  round-robin-per-topic logic which have this issue. In the new consumer,
  we will make the assignment logic customizable so that people can specify
  different rebalance algorithms they like.
 
  Also I will soon send out a new consumer design summary email for more
  comments. Feel free to give us more thoughts you have about the new
  consumer design.
 
  Guozhang
 
 
  On Mon, Jul 7, 2014 at 8:44 AM, Jason Rosenberg j...@squareup.com
 wrote:
 
   Guozhang,
  
   I'm not suggesting we parallelize within a partition
  
   The problem with the current high-level consumer is, if you use a regex
  to
   select multiple topics, and then have multiple consumers in the same
  group,
   usually the first consumer will 'own' all the topics, and no amount of
   sub-sequent rebalancing will allow other consumers in the group to own
  some
   of the topics.  Re-balancing does allow other consumers to own multiple
   partitions, but if a topic has only 1 partition, only the first
 consumer
  to
   initialize will get all the work.
  
   So, I'm wondering if the new api will be better about re-balancing the
  work
   at the partition level, and not the topic level, as such.
  
   Jason
  
  
   On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang wangg...@gmail.com
  wrote:
  
Hi Jason,
   
In the new design the consumption is still at the per-partition
granularity. The main rationale of doing this is ordering: Within a
partition we want to preserve the ordering such that message B
 produced
after message A will also be consumed and processed after message A.
  And
producers can use keys to make sure messages with the same ordering
  group
will be in the same partition. To do this we have to make one
 partition
only being consumed by a single client at a time. On the other hand,
  when
one wants to add the number of consumers beyond the number of
  partitions,
he can always use the topic tool to dynamically add more partitions
 to
   the
topic.
   
Do you have a specific scenario in mind that would require
   single-partition
topics?
   
Guozhang
   
   
   
On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg j...@squareup.com
   wrote:
   
 I've been looking at the new consumer api outlined here:


   
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design

 One issue in the current high-level consumer, is that it does not
 do
  a
good
 job of distributing a set of topics between multiple consumers,
  unless
each
 topic has multiple partitions.  This has always seemed strange to
 me,
since
 at the end of the day, even for single partition topics, the basic
  unit
of
 consumption is still at the partition level (so you'd expect
   rebalancing
to
 try to evenly distribute partitions (regardless of the topic)).

 It's not clearly spelled out in the new consumer api wiki, so I'll
  just
 ask, will this issue be addressed in the new api?  I think I've
 asked
this
 before, but I wanted to go check again, and am not seeing this
   explicitly
 addressed in the design.

 Thanks

 Jason

   
   
   
--
-- Guozhang
   
  
 
 
 
  --
  -- Guozhang
 




-- 
-- Guozhang


Re: New Consumer API discussion

2014-03-27 Thread Neha Narkhede
 the
 consumer's
 offsets.
 
 12. If I wish to decouple the consumer from the offset checkpointing, is
 it OK to use Joel's offset management stuff directly, rather than
 through
 the consumer's commit API?
 
 
 Cheers,
 Chris
 
 On 2/10/14 10:54 AM, Neha Narkhede neha.narkh...@gmail.com wrote:
 
 As mentioned in previous emails, we are also working on a
 re-implementation
 of the consumer. I would like to use this email thread to discuss the
 details of the public API. I would also like us to be picky about this
 public api now so it is as good as possible and we don't need to break
 it
 in the future.
 
 The best way to get a feel for the API is actually to take a look at
 the
 javadoc
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc
 /
 doc/kafka/clients/consumer/KafkaConsumer.html,
 the hope is to get the api docs good enough so that it is
 self-explanatory.
 You can also take a look at the configs
 here
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/do
 c
 /kafka/clients/consumer/ConsumerConfig.html
 
 Some background info on implementation:
 
 At a high level the primary difference in this consumer is that it
 removes
 the distinction between the high-level and low-level consumer. The
 new
 consumer API is non blocking and instead of returning a blocking
 iterator,
 the consumer provides a poll() API that returns a list of records. We
 think
 this is better compared to the blocking iterators since it effectively
 decouples the threading strategy used for processing messages from the
 consumer. It is worth noting that the consumer is entirely single
 threaded
 and runs in the user thread. The advantage is that it can be easily
 rewritten in less multi-threading-friendly languages. The consumer
 batches
 data and multiplexes I/O over TCP connections to each of the brokers it
 communicates with, for high throughput. The consumer also allows long
 poll
 to reduce the end-to-end message latency for low throughput data.
 
 The consumer provides a group management facility that supports the
 concept
 of a group with multiple consumer instances (just like the current
 consumer). This is done through a custom heartbeat and group management
 protocol transparent to the user. At the same time, it allows users the
 option to subscribe to a fixed set of partitions and not use group
 management at all. The offset management strategy defaults to Kafka
 based
 offset management and the API provides a way for the user to use a
 customized offset store to manage the consumer's offsets.
 
 A key difference in this consumer also is the fact that it does not
 depend
 on zookeeper at all.
 
 More details about the new consumer design are
 here
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer
 +
 Rewrite+Design
 
 Please take a look at the new
 API
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc
 /
 kafka/clients/consumer/KafkaConsumer.htmland
 give us any thoughts you may have.
 
 Thanks,
 Neha
 






Re: New Consumer API discussion

2014-03-24 Thread Neha Narkhede
 wrote:
 
 As mentioned in previous emails, we are also working on a
 re-implementation
 of the consumer. I would like to use this email thread to discuss the
 details of the public API. I would also like us to be picky about this
 public api now so it is as good as possible and we don't need to break
 it
 in the future.
 
 The best way to get a feel for the API is actually to take a look at the
 javadoc
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc
 /
 doc/kafka/clients/consumer/KafkaConsumer.html,
 the hope is to get the api docs good enough so that it is
 self-explanatory.
 You can also take a look at the configs
 here
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/do
 c
 /kafka/clients/consumer/ConsumerConfig.html
 
 Some background info on implementation:
 
 At a high level the primary difference in this consumer is that it
 removes
 the distinction between the high-level and low-level consumer. The
 new
 consumer API is non blocking and instead of returning a blocking
 iterator,
 the consumer provides a poll() API that returns a list of records. We
 think
 this is better compared to the blocking iterators since it effectively
 decouples the threading strategy used for processing messages from the
 consumer. It is worth noting that the consumer is entirely single
 threaded
 and runs in the user thread. The advantage is that it can be easily
 rewritten in less multi-threading-friendly languages. The consumer
 batches
 data and multiplexes I/O over TCP connections to each of the brokers it
 communicates with, for high throughput. The consumer also allows long
 poll
 to reduce the end-to-end message latency for low throughput data.
 
 The consumer provides a group management facility that supports the
 concept
 of a group with multiple consumer instances (just like the current
 consumer). This is done through a custom heartbeat and group management
 protocol transparent to the user. At the same time, it allows users the
 option to subscribe to a fixed set of partitions and not use group
 management at all. The offset management strategy defaults to Kafka
 based
 offset management and the API provides a way for the user to use a
 customized offset store to manage the consumer's offsets.
 
 A key difference in this consumer also is the fact that it does not
 depend
 on zookeeper at all.
 
 More details about the new consumer design are
 here
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer
 +
 Rewrite+Design
 
 Please take a look at the new
 API
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc
 /
 kafka/clients/consumer/KafkaConsumer.htmland
 give us any thoughts you may have.
 
 Thanks,
 Neha
 





Re: New Consumer API discussion

2014-03-17 Thread Neha Narkhede
I'm not quite sure if I fully understood your question. The consumer API
exposes a close() method that will shutdown the consumer's connections to
all brokers and frees up resources that the consumer uses.

I've updated the javadoc for the new consumer API to include a few examples
of different ways of using the consumer. Probably you might find it useful
-
http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html

Thanks,
Neha


On Sun, Mar 16, 2014 at 7:55 PM, Shanmugam, Srividhya 
srividhyashanmu...@fico.com wrote:

 Can the consumer API provide a way to shut down the connector by doing a
 look up by the consumer group Id? For example, application may be consuming
 the messages in one thread whereas the shutdown call can  be initiated in a
 different thread.

 This email and any files transmitted with it are confidential, proprietary
 and intended solely for the individual or entity to whom they are
 addressed. If you have received this email in error please delete it
 immediately.



Re: New Consumer API discussion

2014-03-16 Thread Shanmugam, Srividhya
Can the consumer API provide a way to shut down the connector by doing a look 
up by the consumer group Id? For example, application may be consuming the 
messages in one thread whereas the shutdown call can  be initiated in a 
different thread.

This email and any files transmitted with it are confidential, proprietary and 
intended solely for the individual or entity to whom they are addressed. If you 
have received this email in error please delete it immediately.


Re: New Consumer API discussion

2014-03-03 Thread Chris Riccomini
Hey Guys,

Sorry for the late follow up. Here are my questions/thoughts on the API:

1. Why is the config String-Object instead of String-String?

2. Are these Java docs correct?

  KafkaConsumer(java.util.Mapjava.lang.String,java.lang.Object configs)
  A consumer is instantiated by providing a set of key-value pairs as
configuration and a ConsumerRebalanceCallback implementation

There is no ConsumerRebalanceCallback parameter.

3. Would like to have a method:

  poll(long timeout, java.util.concurrent.TimeUnit timeUnit,
TopicPartition... topicAndPartitionsToPoll)

I see I can effectively do this by just fiddling with subscribe and
unsubscribe before each poll. Is this a low-overhead operation? Can I just
unsubscribe from everything after each poll, then re-subscribe to a topic
the next iteration. I would probably be doing this in a fairly tight loop.

4. The behavior of AUTO_OFFSET_RESET_CONFIG is overloaded. I think there
are use cases for decoupling what to do when no offset exists from what
to do when I'm out of range. I might want to start from smallest the
first time I run, but fail if I ever get offset out of range.

5. ENABLE_JMX could use Java docs, even though it's fairly
self-explanatory.

6. Clarity about whether FETCH_BUFFER_CONFIG is per-topic/partition, or
across all topic/partitions is useful. I believe it's per-topic/partition,
right? That is, setting to 2 megs with two TopicAndPartitions would result
in 4 megs worth of data coming in per fetch, right?

7. What does the consumer do if METADATA_FETCH_TIMEOUT_CONFIG times out?
Retry, or throw exception?

8. Does RECONNECT_BACKOFF_MS_CONFIG apply to both metadata requests and
fetch requests?

9. What does SESSION_TIMEOUT_MS default to?

10. Is this consumer thread-safe?

11. How do you use a different offset management strategy? Your email
implies that it's pluggable, but I don't see how. The offset management
strategy defaults to Kafka based offset management and the API provides a
way for the user to use a customized offset store to manage the consumer's
offsets.

12. If I wish to decouple the consumer from the offset checkpointing, is
it OK to use Joel's offset management stuff directly, rather than through
the consumer's commit API?


Cheers,
Chris

On 2/10/14 10:54 AM, Neha Narkhede neha.narkh...@gmail.com wrote:

As mentioned in previous emails, we are also working on a
re-implementation
of the consumer. I would like to use this email thread to discuss the
details of the public API. I would also like us to be picky about this
public api now so it is as good as possible and we don't need to break it
in the future.

The best way to get a feel for the API is actually to take a look at the
javadochttp://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/
doc/kafka/clients/consumer/KafkaConsumer.html,
the hope is to get the api docs good enough so that it is
self-explanatory.
You can also take a look at the configs
herehttp://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc
/kafka/clients/consumer/ConsumerConfig.html

Some background info on implementation:

At a high level the primary difference in this consumer is that it removes
the distinction between the high-level and low-level consumer. The new
consumer API is non blocking and instead of returning a blocking iterator,
the consumer provides a poll() API that returns a list of records. We
think
this is better compared to the blocking iterators since it effectively
decouples the threading strategy used for processing messages from the
consumer. It is worth noting that the consumer is entirely single threaded
and runs in the user thread. The advantage is that it can be easily
rewritten in less multi-threading-friendly languages. The consumer batches
data and multiplexes I/O over TCP connections to each of the brokers it
communicates with, for high throughput. The consumer also allows long poll
to reduce the end-to-end message latency for low throughput data.

The consumer provides a group management facility that supports the
concept
of a group with multiple consumer instances (just like the current
consumer). This is done through a custom heartbeat and group management
protocol transparent to the user. At the same time, it allows users the
option to subscribe to a fixed set of partitions and not use group
management at all. The offset management strategy defaults to Kafka based
offset management and the API provides a way for the user to use a
customized offset store to manage the consumer's offsets.

A key difference in this consumer also is the fact that it does not depend
on zookeeper at all.

More details about the new consumer design are
herehttps://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+
Rewrite+Design

Please take a look at the new
APIhttp://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/
kafka/clients/consumer/KafkaConsumer.htmland
give us any thoughts you may have.

Thanks,
Neha



Re: New Consumer API discussion

2014-03-03 Thread Chris Riccomini
Hey Guys,

Also, for reference, we'll be looking to implement new Samza consumers
which have these APIs:

http://samza.incubator.apache.org/learn/documentation/0.7.0/api/javadocs/or
g/apache/samza/system/SystemConsumer.html

http://samza.incubator.apache.org/learn/documentation/0.7.0/api/javadocs/or
g/apache/samza/checkpoint/CheckpointManager.html


Question (3) below is a result of having Samza's SystemConsumers poll
allow specific topic/partitions to be specified.

The split between consumer and checkpoint manager is the reason for
question (12) below.

Cheers,
Chris

On 3/3/14 10:19 AM, Chris Riccomini criccom...@linkedin.com wrote:

Hey Guys,

Sorry for the late follow up. Here are my questions/thoughts on the API:

1. Why is the config String-Object instead of String-String?

2. Are these Java docs correct?

  KafkaConsumer(java.util.Mapjava.lang.String,java.lang.Object configs)
  A consumer is instantiated by providing a set of key-value pairs as
configuration and a ConsumerRebalanceCallback implementation

There is no ConsumerRebalanceCallback parameter.

3. Would like to have a method:

  poll(long timeout, java.util.concurrent.TimeUnit timeUnit,
TopicPartition... topicAndPartitionsToPoll)

I see I can effectively do this by just fiddling with subscribe and
unsubscribe before each poll. Is this a low-overhead operation? Can I just
unsubscribe from everything after each poll, then re-subscribe to a topic
the next iteration. I would probably be doing this in a fairly tight loop.

4. The behavior of AUTO_OFFSET_RESET_CONFIG is overloaded. I think there
are use cases for decoupling what to do when no offset exists from what
to do when I'm out of range. I might want to start from smallest the
first time I run, but fail if I ever get offset out of range.

5. ENABLE_JMX could use Java docs, even though it's fairly
self-explanatory.

6. Clarity about whether FETCH_BUFFER_CONFIG is per-topic/partition, or
across all topic/partitions is useful. I believe it's per-topic/partition,
right? That is, setting to 2 megs with two TopicAndPartitions would result
in 4 megs worth of data coming in per fetch, right?

7. What does the consumer do if METADATA_FETCH_TIMEOUT_CONFIG times out?
Retry, or throw exception?

8. Does RECONNECT_BACKOFF_MS_CONFIG apply to both metadata requests and
fetch requests?

9. What does SESSION_TIMEOUT_MS default to?

10. Is this consumer thread-safe?

11. How do you use a different offset management strategy? Your email
implies that it's pluggable, but I don't see how. The offset management
strategy defaults to Kafka based offset management and the API provides a
way for the user to use a customized offset store to manage the consumer's
offsets.

12. If I wish to decouple the consumer from the offset checkpointing, is
it OK to use Joel's offset management stuff directly, rather than through
the consumer's commit API?


Cheers,
Chris

On 2/10/14 10:54 AM, Neha Narkhede neha.narkh...@gmail.com wrote:

As mentioned in previous emails, we are also working on a
re-implementation
of the consumer. I would like to use this email thread to discuss the
details of the public API. I would also like us to be picky about this
public api now so it is as good as possible and we don't need to break it
in the future.

The best way to get a feel for the API is actually to take a look at the
javadochttp://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc
/
doc/kafka/clients/consumer/KafkaConsumer.html,
the hope is to get the api docs good enough so that it is
self-explanatory.
You can also take a look at the configs
herehttp://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/do
c
/kafka/clients/consumer/ConsumerConfig.html

Some background info on implementation:

At a high level the primary difference in this consumer is that it
removes
the distinction between the high-level and low-level consumer. The
new
consumer API is non blocking and instead of returning a blocking
iterator,
the consumer provides a poll() API that returns a list of records. We
think
this is better compared to the blocking iterators since it effectively
decouples the threading strategy used for processing messages from the
consumer. It is worth noting that the consumer is entirely single
threaded
and runs in the user thread. The advantage is that it can be easily
rewritten in less multi-threading-friendly languages. The consumer
batches
data and multiplexes I/O over TCP connections to each of the brokers it
communicates with, for high throughput. The consumer also allows long
poll
to reduce the end-to-end message latency for low throughput data.

The consumer provides a group management facility that supports the
concept
of a group with multiple consumer instances (just like the current
consumer). This is done through a custom heartbeat and group management
protocol transparent to the user. At the same time, it allows users the
option to subscribe to a fixed set of partitions and not use group
management at all

Re: New Consumer API discussion

2014-02-28 Thread Neha Narkhede
1. The new
consumer API is non blocking and instead of returning a blocking iterator,
the consumer provides a poll() API that returns a list of records. 

So this means the consumer polls, and if there are new messages it pulls
them down and then disconnects?

Not really. The point I was trying to make is that the consumer now just
returns
a list of records instead of an iterator. If there are no more messages
available,
it returns an empty list of records. Under the covers, it keeps a
connection open
to every broker.

2.
 The consumer also allows long poll
to reduce the end-to-end message latency for low throughput data.

How is this different than blocking?  Is it even based meaning it keeps a
long poll conneciton open, and if/when a new message arrives it triggers an
event on the consumer side?

It means that you can invoke poll with a timeout. If a message is available
before
the timeout is hit, it returns earlier.


3.
 The consumer batches
data and multiplexes I/O over TCP connections to each of the brokers it
communicates with, for high throughput. 

If it is single threaded, does each tcp brocker connection block?  Not sure
I understand how this works if it is single threaded.

Take a look at this tutorial that explains non blocking socket I/O -
*http://rox-xmlrpc.sourceforge.net/niotut/
http://rox-xmlrpc.sourceforge.net/niotut/*

Thanks,
Neha


On Fri, Feb 28, 2014 at 12:44 PM, S Ahmed sahmed1...@gmail.com wrote:

 Few clarifications:

 1. The new
 consumer API is non blocking and instead of returning a blocking iterator,
 the consumer provides a poll() API that returns a list of records. 

 So this means the consumer polls, and if there are new messages it pulls
 them down and then disconnects?

 2.
  The consumer also allows long poll
 to reduce the end-to-end message latency for low throughput data.

 How is this different than blocking?  Is it even based meaning it keeps a
 long poll conneciton open, and if/when a new message arrives it triggers an
 event on the consumer side?


 3.
  The consumer batches
 data and multiplexes I/O over TCP connections to each of the brokers it
 communicates with, for high throughput. 

 If it is single threaded, does each tcp brocker connection block?  Not sure
 I understand how this works if it is single threaded.



 On Thu, Feb 27, 2014 at 11:38 PM, Robert Withers 
 robert.w.with...@gmail.com
  wrote:

  Thank you, Neha, that makes it clear.  Really, the aspect of all this
 that
  we could really use is a way to do exactly once processing.  We are
 looking
  at more critical data.  What are the latest thoughts on how to achieve
  exactly once and how might that affect a consumer API?
 
  Thanks,
  Rob
 
  On Feb 27, 2014, at 10:29 AM, Neha Narkhede neha.narkh...@gmail.com
  wrote:
 
   Is this
 
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html#seek%28kafka.common.TopicPartitionOffset...%29
  what
   you are looking for? Basically, I think from the overall feedback, it
   looks like code snippets don't seem to work for overall understanding
 of
   the APIs. I plan to update the javadoc with more complete examples that
   have been discussed so far on this thread and generally on the mailing
  list.
  
   Thanks,
   Neha
  
  
  
  
   On Thu, Feb 27, 2014 at 4:17 AM, Robert Withers
   robert.w.with...@gmail.comwrote:
  
   Neha,
  
   I see how one might wish to implement onPartitionsAssigned and
   onPartitionsRevoked, but I don't have a sense for how I might supply
  these
   implementations to a running consumer.  What would the setup code look
  like
   to start a high-level consumer with these provided implementations?
  
   thanks,
   Rob
  
  
   On Feb 27, 2014, at 3:48 AM, Neha Narkhede neha.narkh...@gmail.com
   wrote:
  
   Rob,
  
   The use of the callbacks is explained in the javadoc here -
  
  
 
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html
  
   Let me know if it makes sense. The hope is to improve the javadoc so
  that
   it is self explanatory.
  
   Thanks,
   Neha
  
  
   On Wed, Feb 26, 2014 at 9:16 AM, Robert Withers
   robert.w.with...@gmail.comwrote:
  
   Neha, what does the use of the RebalanceBeginCallback and
   RebalanceEndCallback look like?
  
   thanks,
   Rob
  
   On Feb 25, 2014, at 3:51 PM, Neha Narkhede neha.narkh...@gmail.com
 
   wrote:
  
   How do you know n? The whole point is that you need to be able to
  fetch
   the
   end offset. You can't a priori decide you will load 1m messages
  without
   knowing what is there.
  
   Hmm. I think what you are pointing out is that in the new consumer
  API,
   we
   don't have a way to issue the equivalent of the existing
   getOffsetsBefore()
   API. Agree that is a flaw that we should fix.
  
   Will update the docs/wiki with a few use cases that I've collected
 so
   far
   and see if the API covers those.
  
   I would prefer

Re: New Consumer API discussion

2014-02-27 Thread Neha Narkhede
Rob,

The use of the callbacks is explained in the javadoc here -
http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html

Let me know if it makes sense. The hope is to improve the javadoc so that
it is self explanatory.

Thanks,
Neha


On Wed, Feb 26, 2014 at 9:16 AM, Robert Withers
robert.w.with...@gmail.comwrote:

 Neha, what does the use of the RebalanceBeginCallback and
 RebalanceEndCallback look like?

 thanks,
 Rob

 On Feb 25, 2014, at 3:51 PM, Neha Narkhede neha.narkh...@gmail.com
 wrote:

  How do you know n? The whole point is that you need to be able to fetch
 the
  end offset. You can't a priori decide you will load 1m messages without
  knowing what is there.
 
  Hmm. I think what you are pointing out is that in the new consumer API,
 we
  don't have a way to issue the equivalent of the existing
 getOffsetsBefore()
  API. Agree that is a flaw that we should fix.
 
  Will update the docs/wiki with a few use cases that I've collected so far
  and see if the API covers those.
 
  I would prefer PartitionsAssigned and PartitionsRevoked as that seems
  clearer to me
 
  Well the RebalanceBeginCallback interface will have
 onPartitionsAssigned()
  as the callback. Similarly, the RebalanceEndCallback interface will have
  onPartitionsRevoked() as the callback. Makes sense?
 
  Thanks,
  Neha
 
 
  On Tue, Feb 25, 2014 at 2:38 PM, Jay Kreps jay.kr...@gmail.com wrote:
 
  1. I would prefer PartitionsAssigned and PartitionsRevoked as that seems
  clearer to me.
 
  -Jay
 
 
  On Tue, Feb 25, 2014 at 10:19 AM, Neha Narkhede 
 neha.narkh...@gmail.com
  wrote:
 
  Thanks for the reviews so far! There are a few outstanding questions -
 
  1.  It will be good to make the rebalance callbacks forward compatible
  with
  Java 8 capabilities. We can change it to PartitionsAssignedCallback
  and PartitionsRevokedCallback or RebalanceBeginCallback and
  RebalanceEndCallback?
 
  If there are no objections, I will change it to RebalanceBeginCallback
  and
  RebalanceEndCallback.
 
  2.  The return type for committed() is ListTopicPartitionOffset.
 There
  was a suggestion to change it to either be MapTopicPartition,Long or
  MapTopicPartition, TopicPartitionOffset
 
  Do people have feedback on this suggestion?
 
 
  On Tue, Feb 25, 2014 at 9:56 AM, Neha Narkhede 
 neha.narkh...@gmail.com
  wrote:
 
  Robert,
 
  Are you saying it is possible to get events from the high-level
  consumerregarding various state machine changes?  For instance, can we
  get a
  notification when a rebalance starts and ends, when a partition is
  assigned/unassigned, when an offset is committed on a partition, when
 a
  leader changes and so on?  I call this OOB traffic, since they are not
  the
  core messages streaming, but side-band events, yet they are still
  potentially useful to consumers.
 
  In the current proposal, you get notified when the state machine
  changes
  i.e. before and after a rebalance is triggered. Look at
  ConsumerRebalanceCallback
 
 
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html
 
  .Leader changes do not count as state machine changes for consumer
  rebalance purposes.
 
  Thanks,
  Neha
 
 
  On Tue, Feb 25, 2014 at 9:54 AM, Neha Narkhede 
  neha.narkh...@gmail.com
  wrote:
 
  Jay/Robert -
 
 
  I think what Robert is saying is that we need to think through the
  offset
  API to enable batch processing of topic data. Think of a process
  that
  periodically kicks off to compute a data summary or do a data load or
  something like that. I think what we need to support this is an api
 to
  fetch the last offset from the server for a partition. Something like
long lastOffset(TopicPartition tp)
  and for symmetry
long firstOffset(TopicPartition tp)
 
  Likely this would have to be batched.
 
  A fixed range of data load can be done using the existing APIs as
  follows. This assumes you know the endOffset which can be
  currentOffset
  + n
  (number of messages in the load)
 
  long startOffset = consumer.position(partition);
  long endOffset = startOffset + n;
  while(consumer.position(partition) = endOffset) {
  ListConsumerRecord messages = consumer.poll(timeout,
  TimeUnit.MILLISECONDS);
  process(messages, endOffset);  // processes messages
  until
  endOffset
  }
 
  Does that make sense?
 
 
  On Tue, Feb 25, 2014 at 9:49 AM, Neha Narkhede 
  neha.narkh...@gmail.com
  wrote:
 
  Thanks for the review, Jun. Here are some comments -
 
 
  1. The using of ellipsis: This may make passing a list of items from
  a
  collection to the api a bit harder. Suppose that you have a list of
  topics
  stored in
 
  ArrayListString topics;
 
  If you want subscribe to all topics in one call, you will have to
 do:
 
  String[] topicArray = new String[topics.size()];
  consumer.subscribe(topics.
  toArray(topicArray));
 
  A similar argument can

Re: New Consumer API discussion

2014-02-27 Thread Robert Withers
Neha,

I see how one might wish to implement onPartitionsAssigned and 
onPartitionsRevoked, but I don’t have a sense for how I might supply these 
implementations to a running consumer.  What would the setup code look like to 
start a high-level consumer with these provided implementations?

thanks,
Rob


On Feb 27, 2014, at 3:48 AM, Neha Narkhede neha.narkh...@gmail.com wrote:

 Rob,
 
 The use of the callbacks is explained in the javadoc here -
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html
 
 Let me know if it makes sense. The hope is to improve the javadoc so that
 it is self explanatory.
 
 Thanks,
 Neha
 
 
 On Wed, Feb 26, 2014 at 9:16 AM, Robert Withers
 robert.w.with...@gmail.comwrote:
 
 Neha, what does the use of the RebalanceBeginCallback and
 RebalanceEndCallback look like?
 
 thanks,
 Rob
 
 On Feb 25, 2014, at 3:51 PM, Neha Narkhede neha.narkh...@gmail.com
 wrote:
 
 How do you know n? The whole point is that you need to be able to fetch
 the
 end offset. You can't a priori decide you will load 1m messages without
 knowing what is there.
 
 Hmm. I think what you are pointing out is that in the new consumer API,
 we
 don't have a way to issue the equivalent of the existing
 getOffsetsBefore()
 API. Agree that is a flaw that we should fix.
 
 Will update the docs/wiki with a few use cases that I've collected so far
 and see if the API covers those.
 
 I would prefer PartitionsAssigned and PartitionsRevoked as that seems
 clearer to me
 
 Well the RebalanceBeginCallback interface will have
 onPartitionsAssigned()
 as the callback. Similarly, the RebalanceEndCallback interface will have
 onPartitionsRevoked() as the callback. Makes sense?
 
 Thanks,
 Neha
 
 
 On Tue, Feb 25, 2014 at 2:38 PM, Jay Kreps jay.kr...@gmail.com wrote:
 
 1. I would prefer PartitionsAssigned and PartitionsRevoked as that seems
 clearer to me.
 
 -Jay
 
 
 On Tue, Feb 25, 2014 at 10:19 AM, Neha Narkhede 
 neha.narkh...@gmail.com
 wrote:
 
 Thanks for the reviews so far! There are a few outstanding questions -
 
 1.  It will be good to make the rebalance callbacks forward compatible
 with
 Java 8 capabilities. We can change it to PartitionsAssignedCallback
 and PartitionsRevokedCallback or RebalanceBeginCallback and
 RebalanceEndCallback?
 
 If there are no objections, I will change it to RebalanceBeginCallback
 and
 RebalanceEndCallback.
 
 2.  The return type for committed() is ListTopicPartitionOffset.
 There
 was a suggestion to change it to either be MapTopicPartition,Long or
 MapTopicPartition, TopicPartitionOffset
 
 Do people have feedback on this suggestion?
 
 
 On Tue, Feb 25, 2014 at 9:56 AM, Neha Narkhede 
 neha.narkh...@gmail.com
 wrote:
 
 Robert,
 
 Are you saying it is possible to get events from the high-level
 consumerregarding various state machine changes?  For instance, can we
 get a
 notification when a rebalance starts and ends, when a partition is
 assigned/unassigned, when an offset is committed on a partition, when
 a
 leader changes and so on?  I call this OOB traffic, since they are not
 the
 core messages streaming, but side-band events, yet they are still
 potentially useful to consumers.
 
 In the current proposal, you get notified when the state machine
 changes
 i.e. before and after a rebalance is triggered. Look at
 ConsumerRebalanceCallback
 
 
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html
 
 .Leader changes do not count as state machine changes for consumer
 rebalance purposes.
 
 Thanks,
 Neha
 
 
 On Tue, Feb 25, 2014 at 9:54 AM, Neha Narkhede 
 neha.narkh...@gmail.com
 wrote:
 
 Jay/Robert -
 
 
 I think what Robert is saying is that we need to think through the
 offset
 API to enable batch processing of topic data. Think of a process
 that
 periodically kicks off to compute a data summary or do a data load or
 something like that. I think what we need to support this is an api
 to
 fetch the last offset from the server for a partition. Something like
  long lastOffset(TopicPartition tp)
 and for symmetry
  long firstOffset(TopicPartition tp)
 
 Likely this would have to be batched.
 
 A fixed range of data load can be done using the existing APIs as
 follows. This assumes you know the endOffset which can be
 currentOffset
 + n
 (number of messages in the load)
 
 long startOffset = consumer.position(partition);
 long endOffset = startOffset + n;
 while(consumer.position(partition) = endOffset) {
ListConsumerRecord messages = consumer.poll(timeout,
 TimeUnit.MILLISECONDS);
process(messages, endOffset);  // processes messages
 until
 endOffset
 }
 
 Does that make sense?
 
 
 On Tue, Feb 25, 2014 at 9:49 AM, Neha Narkhede 
 neha.narkh...@gmail.com
 wrote:
 
 Thanks for the review, Jun. Here are some comments -
 
 
 1. The using of ellipsis: This may make passing a list of items from
 a
 collection to the api

Re: New Consumer API discussion

2014-02-27 Thread Neha Narkhede
Is 
thishttp://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html#seek%28kafka.common.TopicPartitionOffset...%29what
you are looking for? Basically, I think from the overall feedback, it
looks like code snippets don't seem to work for overall understanding of
the APIs. I plan to update the javadoc with more complete examples that
have been discussed so far on this thread and generally on the mailing list.

Thanks,
Neha




On Thu, Feb 27, 2014 at 4:17 AM, Robert Withers
robert.w.with...@gmail.comwrote:

 Neha,

 I see how one might wish to implement onPartitionsAssigned and
 onPartitionsRevoked, but I don't have a sense for how I might supply these
 implementations to a running consumer.  What would the setup code look like
 to start a high-level consumer with these provided implementations?

 thanks,
 Rob


 On Feb 27, 2014, at 3:48 AM, Neha Narkhede neha.narkh...@gmail.com
 wrote:

  Rob,
 
  The use of the callbacks is explained in the javadoc here -
 
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html
 
  Let me know if it makes sense. The hope is to improve the javadoc so that
  it is self explanatory.
 
  Thanks,
  Neha
 
 
  On Wed, Feb 26, 2014 at 9:16 AM, Robert Withers
  robert.w.with...@gmail.comwrote:
 
  Neha, what does the use of the RebalanceBeginCallback and
  RebalanceEndCallback look like?
 
  thanks,
  Rob
 
  On Feb 25, 2014, at 3:51 PM, Neha Narkhede neha.narkh...@gmail.com
  wrote:
 
  How do you know n? The whole point is that you need to be able to fetch
  the
  end offset. You can't a priori decide you will load 1m messages without
  knowing what is there.
 
  Hmm. I think what you are pointing out is that in the new consumer API,
  we
  don't have a way to issue the equivalent of the existing
  getOffsetsBefore()
  API. Agree that is a flaw that we should fix.
 
  Will update the docs/wiki with a few use cases that I've collected so
 far
  and see if the API covers those.
 
  I would prefer PartitionsAssigned and PartitionsRevoked as that seems
  clearer to me
 
  Well the RebalanceBeginCallback interface will have
  onPartitionsAssigned()
  as the callback. Similarly, the RebalanceEndCallback interface will
 have
  onPartitionsRevoked() as the callback. Makes sense?
 
  Thanks,
  Neha
 
 
  On Tue, Feb 25, 2014 at 2:38 PM, Jay Kreps jay.kr...@gmail.com
 wrote:
 
  1. I would prefer PartitionsAssigned and PartitionsRevoked as that
 seems
  clearer to me.
 
  -Jay
 
 
  On Tue, Feb 25, 2014 at 10:19 AM, Neha Narkhede 
  neha.narkh...@gmail.com
  wrote:
 
  Thanks for the reviews so far! There are a few outstanding questions
 -
 
  1.  It will be good to make the rebalance callbacks forward
 compatible
  with
  Java 8 capabilities. We can change it to PartitionsAssignedCallback
  and PartitionsRevokedCallback or RebalanceBeginCallback and
  RebalanceEndCallback?
 
  If there are no objections, I will change it to
 RebalanceBeginCallback
  and
  RebalanceEndCallback.
 
  2.  The return type for committed() is ListTopicPartitionOffset.
  There
  was a suggestion to change it to either be MapTopicPartition,Long
 or
  MapTopicPartition, TopicPartitionOffset
 
  Do people have feedback on this suggestion?
 
 
  On Tue, Feb 25, 2014 at 9:56 AM, Neha Narkhede 
  neha.narkh...@gmail.com
  wrote:
 
  Robert,
 
  Are you saying it is possible to get events from the high-level
  consumerregarding various state machine changes?  For instance, can
 we
  get a
  notification when a rebalance starts and ends, when a partition is
  assigned/unassigned, when an offset is committed on a partition,
 when
  a
  leader changes and so on?  I call this OOB traffic, since they are
 not
  the
  core messages streaming, but side-band events, yet they are still
  potentially useful to consumers.
 
  In the current proposal, you get notified when the state machine
  changes
  i.e. before and after a rebalance is triggered. Look at
  ConsumerRebalanceCallback
 
 
 
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html
 
  .Leader changes do not count as state machine changes for consumer
  rebalance purposes.
 
  Thanks,
  Neha
 
 
  On Tue, Feb 25, 2014 at 9:54 AM, Neha Narkhede 
  neha.narkh...@gmail.com
  wrote:
 
  Jay/Robert -
 
 
  I think what Robert is saying is that we need to think through the
  offset
  API to enable batch processing of topic data. Think of a process
  that
  periodically kicks off to compute a data summary or do a data load
 or
  something like that. I think what we need to support this is an api
  to
  fetch the last offset from the server for a partition. Something
 like
   long lastOffset(TopicPartition tp)
  and for symmetry
   long firstOffset(TopicPartition tp)
 
  Likely this would have to be batched.
 
  A fixed range of data load can be done using the existing APIs as
  follows

Re: New Consumer API discussion

2014-02-27 Thread Robert Withers
Thank you, Neha, that makes it clear.  Really, the aspect of all this that we 
could really use is a way to do exactly once processing.  We are looking at 
more critical data.  What are the latest thoughts on how to achieve exactly 
once and how might that affect a consumer API?

Thanks,
Rob

On Feb 27, 2014, at 10:29 AM, Neha Narkhede neha.narkh...@gmail.com wrote:

 Is 
 thishttp://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html#seek%28kafka.common.TopicPartitionOffset...%29what
 you are looking for? Basically, I think from the overall feedback, it
 looks like code snippets don't seem to work for overall understanding of
 the APIs. I plan to update the javadoc with more complete examples that
 have been discussed so far on this thread and generally on the mailing list.
 
 Thanks,
 Neha
 
 
 
 
 On Thu, Feb 27, 2014 at 4:17 AM, Robert Withers
 robert.w.with...@gmail.comwrote:
 
 Neha,
 
 I see how one might wish to implement onPartitionsAssigned and
 onPartitionsRevoked, but I don't have a sense for how I might supply these
 implementations to a running consumer.  What would the setup code look like
 to start a high-level consumer with these provided implementations?
 
 thanks,
 Rob
 
 
 On Feb 27, 2014, at 3:48 AM, Neha Narkhede neha.narkh...@gmail.com
 wrote:
 
 Rob,
 
 The use of the callbacks is explained in the javadoc here -
 
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html
 
 Let me know if it makes sense. The hope is to improve the javadoc so that
 it is self explanatory.
 
 Thanks,
 Neha
 
 
 On Wed, Feb 26, 2014 at 9:16 AM, Robert Withers
 robert.w.with...@gmail.comwrote:
 
 Neha, what does the use of the RebalanceBeginCallback and
 RebalanceEndCallback look like?
 
 thanks,
 Rob
 
 On Feb 25, 2014, at 3:51 PM, Neha Narkhede neha.narkh...@gmail.com
 wrote:
 
 How do you know n? The whole point is that you need to be able to fetch
 the
 end offset. You can't a priori decide you will load 1m messages without
 knowing what is there.
 
 Hmm. I think what you are pointing out is that in the new consumer API,
 we
 don't have a way to issue the equivalent of the existing
 getOffsetsBefore()
 API. Agree that is a flaw that we should fix.
 
 Will update the docs/wiki with a few use cases that I've collected so
 far
 and see if the API covers those.
 
 I would prefer PartitionsAssigned and PartitionsRevoked as that seems
 clearer to me
 
 Well the RebalanceBeginCallback interface will have
 onPartitionsAssigned()
 as the callback. Similarly, the RebalanceEndCallback interface will
 have
 onPartitionsRevoked() as the callback. Makes sense?
 
 Thanks,
 Neha
 
 
 On Tue, Feb 25, 2014 at 2:38 PM, Jay Kreps jay.kr...@gmail.com
 wrote:
 
 1. I would prefer PartitionsAssigned and PartitionsRevoked as that
 seems
 clearer to me.
 
 -Jay
 
 
 On Tue, Feb 25, 2014 at 10:19 AM, Neha Narkhede 
 neha.narkh...@gmail.com
 wrote:
 
 Thanks for the reviews so far! There are a few outstanding questions
 -
 
 1.  It will be good to make the rebalance callbacks forward
 compatible
 with
 Java 8 capabilities. We can change it to PartitionsAssignedCallback
 and PartitionsRevokedCallback or RebalanceBeginCallback and
 RebalanceEndCallback?
 
 If there are no objections, I will change it to
 RebalanceBeginCallback
 and
 RebalanceEndCallback.
 
 2.  The return type for committed() is ListTopicPartitionOffset.
 There
 was a suggestion to change it to either be MapTopicPartition,Long
 or
 MapTopicPartition, TopicPartitionOffset
 
 Do people have feedback on this suggestion?
 
 
 On Tue, Feb 25, 2014 at 9:56 AM, Neha Narkhede 
 neha.narkh...@gmail.com
 wrote:
 
 Robert,
 
 Are you saying it is possible to get events from the high-level
 consumerregarding various state machine changes?  For instance, can
 we
 get a
 notification when a rebalance starts and ends, when a partition is
 assigned/unassigned, when an offset is committed on a partition,
 when
 a
 leader changes and so on?  I call this OOB traffic, since they are
 not
 the
 core messages streaming, but side-band events, yet they are still
 potentially useful to consumers.
 
 In the current proposal, you get notified when the state machine
 changes
 i.e. before and after a rebalance is triggered. Look at
 ConsumerRebalanceCallback
 
 
 
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html
 
 .Leader changes do not count as state machine changes for consumer
 rebalance purposes.
 
 Thanks,
 Neha
 
 
 On Tue, Feb 25, 2014 at 9:54 AM, Neha Narkhede 
 neha.narkh...@gmail.com
 wrote:
 
 Jay/Robert -
 
 
 I think what Robert is saying is that we need to think through the
 offset
 API to enable batch processing of topic data. Think of a process
 that
 periodically kicks off to compute a data summary or do a data load
 or
 something like that. I think what we need to support

Re: New Consumer API discussion

2014-02-26 Thread Robert Withers
Neha, what does the use of the RebalanceBeginCallback and RebalanceEndCallback 
look like?

thanks,
Rob

On Feb 25, 2014, at 3:51 PM, Neha Narkhede neha.narkh...@gmail.com wrote:

 How do you know n? The whole point is that you need to be able to fetch the
 end offset. You can't a priori decide you will load 1m messages without
 knowing what is there.
 
 Hmm. I think what you are pointing out is that in the new consumer API, we
 don't have a way to issue the equivalent of the existing getOffsetsBefore()
 API. Agree that is a flaw that we should fix.
 
 Will update the docs/wiki with a few use cases that I've collected so far
 and see if the API covers those.
 
 I would prefer PartitionsAssigned and PartitionsRevoked as that seems
 clearer to me
 
 Well the RebalanceBeginCallback interface will have onPartitionsAssigned()
 as the callback. Similarly, the RebalanceEndCallback interface will have
 onPartitionsRevoked() as the callback. Makes sense?
 
 Thanks,
 Neha
 
 
 On Tue, Feb 25, 2014 at 2:38 PM, Jay Kreps jay.kr...@gmail.com wrote:
 
 1. I would prefer PartitionsAssigned and PartitionsRevoked as that seems
 clearer to me.
 
 -Jay
 
 
 On Tue, Feb 25, 2014 at 10:19 AM, Neha Narkhede neha.narkh...@gmail.com
 wrote:
 
 Thanks for the reviews so far! There are a few outstanding questions -
 
 1.  It will be good to make the rebalance callbacks forward compatible
 with
 Java 8 capabilities. We can change it to PartitionsAssignedCallback
 and PartitionsRevokedCallback or RebalanceBeginCallback and
 RebalanceEndCallback?
 
 If there are no objections, I will change it to RebalanceBeginCallback
 and
 RebalanceEndCallback.
 
 2.  The return type for committed() is ListTopicPartitionOffset. There
 was a suggestion to change it to either be MapTopicPartition,Long or
 MapTopicPartition, TopicPartitionOffset
 
 Do people have feedback on this suggestion?
 
 
 On Tue, Feb 25, 2014 at 9:56 AM, Neha Narkhede neha.narkh...@gmail.com
 wrote:
 
 Robert,
 
 Are you saying it is possible to get events from the high-level
 consumerregarding various state machine changes?  For instance, can we
 get a
 notification when a rebalance starts and ends, when a partition is
 assigned/unassigned, when an offset is committed on a partition, when a
 leader changes and so on?  I call this OOB traffic, since they are not
 the
 core messages streaming, but side-band events, yet they are still
 potentially useful to consumers.
 
 In the current proposal, you get notified when the state machine
 changes
 i.e. before and after a rebalance is triggered. Look at
 ConsumerRebalanceCallback
 
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html
 
 .Leader changes do not count as state machine changes for consumer
 rebalance purposes.
 
 Thanks,
 Neha
 
 
 On Tue, Feb 25, 2014 at 9:54 AM, Neha Narkhede 
 neha.narkh...@gmail.com
 wrote:
 
 Jay/Robert -
 
 
 I think what Robert is saying is that we need to think through the
 offset
 API to enable batch processing of topic data. Think of a process
 that
 periodically kicks off to compute a data summary or do a data load or
 something like that. I think what we need to support this is an api to
 fetch the last offset from the server for a partition. Something like
   long lastOffset(TopicPartition tp)
 and for symmetry
   long firstOffset(TopicPartition tp)
 
 Likely this would have to be batched.
 
 A fixed range of data load can be done using the existing APIs as
 follows. This assumes you know the endOffset which can be
 currentOffset
 + n
 (number of messages in the load)
 
 long startOffset = consumer.position(partition);
 long endOffset = startOffset + n;
 while(consumer.position(partition) = endOffset) {
 ListConsumerRecord messages = consumer.poll(timeout,
 TimeUnit.MILLISECONDS);
 process(messages, endOffset);  // processes messages
 until
 endOffset
 }
 
 Does that make sense?
 
 
 On Tue, Feb 25, 2014 at 9:49 AM, Neha Narkhede 
 neha.narkh...@gmail.com
 wrote:
 
 Thanks for the review, Jun. Here are some comments -
 
 
 1. The using of ellipsis: This may make passing a list of items from
 a
 collection to the api a bit harder. Suppose that you have a list of
 topics
 stored in
 
 ArrayListString topics;
 
 If you want subscribe to all topics in one call, you will have to do:
 
 String[] topicArray = new String[topics.size()];
 consumer.subscribe(topics.
 toArray(topicArray));
 
 A similar argument can be made for arguably the more common use case
 of
 subscribing to a single topic as well. In these cases, user is
 required
 to write more
 code to create a single item collection and pass it in. Since
 subscription is extremely lightweight
 invoking it multiple times also seems like a workable solution, no?
 
 2. It would be good to document that the following apis are mutually
 exclusive. Also, if the partition level subscription is specified,
 there
 is
 no group management. Finally, unsubscribe() can

Re: New Consumer API discussion

2014-02-25 Thread Neha Narkhede
Thanks for the review, Jun. Here are some comments -

1. The using of ellipsis: This may make passing a list of items from a
collection to the api a bit harder. Suppose that you have a list of topics
stored in

ArrayListString topics;

If you want subscribe to all topics in one call, you will have to do:

String[] topicArray = new String[topics.size()];
consumer.subscribe(topics.
toArray(topicArray));

A similar argument can be made for arguably the more common use case of
subscribing to a single topic as well. In these cases, user is required to
write more
code to create a single item collection and pass it in. Since subscription
is extremely lightweight
invoking it multiple times also seems like a workable solution, no?

2. It would be good to document that the following apis are mutually
exclusive. Also, if the partition level subscription is specified, there is
no group management. Finally, unsubscribe() can only be used to cancel
subscriptions with the same pattern. For example, you can't unsubscribe at
the partition level if the subscription is done at the topic level.

*subscribe*(java.lang.String... topics)
*subscribe*(java.lang.String topic, int... partitions)

Makes sense. Made the suggested improvements to the
docshttp://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/Consumer.html#subscribe%28java.lang.String...%29

3.commit(): The following comment in the doc should probably say commit
offsets for partitions assigned to this consumer.

 If no partitions are specified, commits offsets for the subscribed list of
topics and partitions to Kafka.

Could you give more context on this suggestion? Here is the entire doc -

Synchronously commits the specified offsets for the specified list of
topics and partitions to *Kafka*. If no partitions are specified, commits
offsets for the subscribed list of topics and partitions.

The hope is to convey that if no partitions are specified, offsets will be
committed for the subscribed list of partitions. One improvement could be to
explicitly state that the offsets returned on the last poll will be
committed. I updated this to -

Synchronously commits the specified offsets for the specified list of
topics and partitions to *Kafka*. If no offsets are specified, commits
offsets returned on the last {@link #poll(long, TimeUnit) poll()} for the
subscribed list of topics and partitions.

4. There is inconsistency in specifying partitions. Sometimes we use
TopicPartition and some other times we use String and int (see
examples below).

void onPartitionsAssigned(Consumer consumer, TopicPartition...partitions)

public void *subscribe*(java.lang.String topic, int... partitions)

Yes, this was discussed previously. I think generally the consensus seems
to be to use the higher level
classes everywhere. Made those changes.

What's the use case of position()? Isn't that just the nextOffset() on the
last message returned from poll()?

Yes, except in the case where a rebalance is triggered and poll() is not
yet invoked. Here, you would use position() to get the new fetch position
for the specific partition. Even if this is not a common use case, IMO it
is much easier to use position() to get the fetch offset than invoking
nextOffset() on the last message. This also keeps the APIs symmetric, which
is nice.




On Mon, Feb 24, 2014 at 7:06 PM, Withers, Robert robert.with...@dish.comwrote:

 That's wonderful.  Thanks for kafka.

 Rob

 On Feb 24, 2014, at 9:58 AM, Guozhang Wang wangg...@gmail.commailto:
 wangg...@gmail.com wrote:

 Hi Robert,

 Yes, you can check out the callback functions in the new API

 onPartitionDesigned
 onPartitionAssigned

 and see if they meet your needs.

 Guozhang


 On Mon, Feb 24, 2014 at 8:18 AM, Withers, Robert robert.with...@dish.com
 mailto:robert.with...@dish.comwrote:

 Jun,

 Are you saying it is possible to get events from the high-level consumer
 regarding various state machine changes?  For instance, can we get a
 notification when a rebalance starts and ends, when a partition is
 assigned/unassigned, when an offset is committed on a partition, when a
 leader changes and so on?  I call this OOB traffic, since they are not the
 core messages streaming, but side-band events, yet they are still
 potentially useful to consumers.

 Thank you,
 Robert


 Robert Withers
 Staff Analyst/Developer
 o: (720) 514-8963
 c:  (571) 262-1873



 -Original Message-
 From: Jun Rao [mailto:jun...@gmail.com]
 Sent: Sunday, February 23, 2014 4:19 PM
 To: users@kafka.apache.orgmailto:users@kafka.apache.org
 Subject: Re: New Consumer API discussion

 Robert,

 For the push orient api, you can potentially implement your own
 MessageHandler with those methods. In the main loop of our new consumer
 api, you can just call those methods based on the events you get.

 Also, we already have an api to get the first and the last offset of a
 partition (getOffsetBefore).

 Thanks,

 Jun


 On Sat, Feb 22, 2014 at 11:29 AM

Re: New Consumer API discussion

2014-02-25 Thread Neha Narkhede
 saying it is possible to get events from the high-level consumer
 regarding various state machine changes?  For instance, can we get a
 notification when a rebalance starts and ends, when a partition is
 assigned/unassigned, when an offset is committed on a partition, when a
 leader changes and so on?  I call this OOB traffic, since they are not the
 core messages streaming, but side-band events, yet they are still
 potentially useful to consumers.

 Thank you,
 Robert


 Robert Withers
 Staff Analyst/Developer
 o: (720) 514-8963
 c:  (571) 262-1873



 -Original Message-
 From: Jun Rao [mailto:jun...@gmail.com]
 Sent: Sunday, February 23, 2014 4:19 PM
 To: users@kafka.apache.orgmailto:users@kafka.apache.org
 Subject: Re: New Consumer API discussion

 Robert,

 For the push orient api, you can potentially implement your own
 MessageHandler with those methods. In the main loop of our new consumer
 api, you can just call those methods based on the events you get.

 Also, we already have an api to get the first and the last offset of a
 partition (getOffsetBefore).

 Thanks,

 Jun


 On Sat, Feb 22, 2014 at 11:29 AM, Withers, Robert
 robert.with...@dish.commailto:robert.with...@dish.comwrote:

 This is a good idea, too.  I would modify it to include stream
 marking, then you can have:

 long end = consumer.lastOffset(tp);
 consumer.setMark(end);
 while(consumer.beforeMark()) {
   process(consumer.pollToMark());
 }

 or

 long end = consumer.lastOffset(tp);
 consumer.setMark(end);
 for(Object msg : consumer.iteratorToMark()) {
   process(msg);
 }

 I actually have 4 suggestions, then:

 *   pull: stream marking
 *   pull: finite streams, bound by time range (up-to-now, yesterday) or
 offset
 *   pull: async api
 *   push: KafkaMessageSource, for a push model, with msg and OOB events.
 Build one in either individual or chunk mode and have a listener for
 each msg or a listener for a chunk of msgs.  Make it composable and
 policy driven (chunked, range, commitOffsets policy, retry policy,
 transactional)

 Thank you,
 Robert

 On Feb 22, 2014, at 11:21 AM, Jay Kreps jay.kr...@gmail.commailto:
 jay.kr...@gmail.commailto:
 jay.kr...@gmail.commailto:jay.kr...@gmail.com wrote:

 I think what Robert is saying is that we need to think through the
 offset API to enable batch processing of topic data. Think of a
 process that periodically kicks off to compute a data summary or do a
 data load or something like that. I think what we need to support this
 is an api to fetch the last offset from the server for a partition.
 Something like
  long lastOffset(TopicPartition tp)
 and for symmetry
  long firstOffset(TopicPartition tp)

 Likely this would have to be batched. Essentially we should add this
 use case to our set of code examples to write and think through.

 The usage would be something like

 long end = consumer.lastOffset(tp);
 while(consumer.position  end)
   process(consumer.poll());

 -Jay


 On Sat, Feb 22, 2014 at 1:52 AM, Withers, Robert
 robert.with...@dish.commailto:robert.with...@dish.com
 mailto:robert.with...@dish.comwrote:

 Jun,

 I was originally thinking a non-blocking read from a distributed
 stream should distinguish between no local messages, but a fetch is
 occurring
 versus you have drained the stream.  The reason this may be valuable
 to me is so I can write consumers that read all known traffic then
 terminate.
 You caused me to reconsider and I think I am conflating 2 things.  One
 is a sync/async api while the other is whether to have an infinite or
 finite stream.  Is it possible to build a finite KafkaStream on a
 range of messages?

 Perhaps a Simple Consumer would do just fine and then I could start
 off getting the writeOffset from zookeeper and tell it to read a
 specified range per partition.  I've done this and forked a simple
 consumer runnable for each partition, for one of our analyzers.  The
 great thing about the high-level consumer is that rebalance, so I can
 fork however many stream readers I want and you just figure it out for
 me.  In that way you offer us the control over the resource
 consumption within a pull model.  This is best to regulate message
 pressure, they say.

 Combining that high-level rebalance ability with a ranged partition
 drain could be really nice...build the stream with an ending position
 and it is a finite stream, but retain the high-level rebalance.  With
 a finite stream, you would know the difference of the 2 async
 scenarios: fetch-in-progress versus end-of-stream.  With an infinite
 stream, you never get end-of-stream.

 Aside from a high-level consumer over a finite range within each
 partition, the other feature I can think of is more complicated.  A
 high-level consumer has state machine changes that the client cannot
 access, to my knowledge.  Our use of kafka has us invoke a message
 handler with each message we consumer from the KafkaStream, so we
 convert a pull-model to a push-model.  Including the idea of receiving

Re: New Consumer API discussion

2014-02-25 Thread Neha Narkhede
() on the
 last message returned from poll()?

 Yes, except in the case where a rebalance is triggered and poll() is not
 yet invoked. Here, you would use position() to get the new fetch position
 for the specific partition. Even if this is not a common use case, IMO it
 is much easier to use position() to get the fetch offset than invoking
 nextOffset() on the last message. This also keeps the APIs symmetric, which
 is nice.




 On Mon, Feb 24, 2014 at 7:06 PM, Withers, Robert robert.with...@dish.com
  wrote:

 That's wonderful.  Thanks for kafka.

 Rob

 On Feb 24, 2014, at 9:58 AM, Guozhang Wang wangg...@gmail.commailto:
 wangg...@gmail.com wrote:

 Hi Robert,

 Yes, you can check out the callback functions in the new API

 onPartitionDesigned
 onPartitionAssigned

 and see if they meet your needs.

 Guozhang


 On Mon, Feb 24, 2014 at 8:18 AM, Withers, Robert 
 robert.with...@dish.commailto:robert.with...@dish.comwrote:

 Jun,

 Are you saying it is possible to get events from the high-level consumer
 regarding various state machine changes?  For instance, can we get a
 notification when a rebalance starts and ends, when a partition is
 assigned/unassigned, when an offset is committed on a partition, when a
 leader changes and so on?  I call this OOB traffic, since they are not
 the
 core messages streaming, but side-band events, yet they are still
 potentially useful to consumers.

 Thank you,
 Robert


 Robert Withers
 Staff Analyst/Developer
 o: (720) 514-8963
 c:  (571) 262-1873



 -Original Message-
 From: Jun Rao [mailto:jun...@gmail.com]
 Sent: Sunday, February 23, 2014 4:19 PM
 To: users@kafka.apache.orgmailto:users@kafka.apache.org
 Subject: Re: New Consumer API discussion

 Robert,

 For the push orient api, you can potentially implement your own
 MessageHandler with those methods. In the main loop of our new consumer
 api, you can just call those methods based on the events you get.

 Also, we already have an api to get the first and the last offset of a
 partition (getOffsetBefore).

 Thanks,

 Jun


 On Sat, Feb 22, 2014 at 11:29 AM, Withers, Robert
 robert.with...@dish.commailto:robert.with...@dish.comwrote:

 This is a good idea, too.  I would modify it to include stream
 marking, then you can have:

 long end = consumer.lastOffset(tp);
 consumer.setMark(end);
 while(consumer.beforeMark()) {
   process(consumer.pollToMark());
 }

 or

 long end = consumer.lastOffset(tp);
 consumer.setMark(end);
 for(Object msg : consumer.iteratorToMark()) {
   process(msg);
 }

 I actually have 4 suggestions, then:

 *   pull: stream marking
 *   pull: finite streams, bound by time range (up-to-now, yesterday) or
 offset
 *   pull: async api
 *   push: KafkaMessageSource, for a push model, with msg and OOB events.
 Build one in either individual or chunk mode and have a listener for
 each msg or a listener for a chunk of msgs.  Make it composable and
 policy driven (chunked, range, commitOffsets policy, retry policy,
 transactional)

 Thank you,
 Robert

 On Feb 22, 2014, at 11:21 AM, Jay Kreps jay.kr...@gmail.commailto:
 jay.kr...@gmail.commailto:
 jay.kr...@gmail.commailto:jay.kr...@gmail.com wrote:

 I think what Robert is saying is that we need to think through the
 offset API to enable batch processing of topic data. Think of a
 process that periodically kicks off to compute a data summary or do a
 data load or something like that. I think what we need to support this
 is an api to fetch the last offset from the server for a partition.
 Something like
  long lastOffset(TopicPartition tp)
 and for symmetry
  long firstOffset(TopicPartition tp)

 Likely this would have to be batched. Essentially we should add this
 use case to our set of code examples to write and think through.

 The usage would be something like

 long end = consumer.lastOffset(tp);
 while(consumer.position  end)
   process(consumer.poll());

 -Jay


 On Sat, Feb 22, 2014 at 1:52 AM, Withers, Robert
 robert.with...@dish.commailto:robert.with...@dish.com
 mailto:robert.with...@dish.comwrote:

 Jun,

 I was originally thinking a non-blocking read from a distributed
 stream should distinguish between no local messages, but a fetch is
 occurring
 versus you have drained the stream.  The reason this may be valuable
 to me is so I can write consumers that read all known traffic then
 terminate.
 You caused me to reconsider and I think I am conflating 2 things.  One
 is a sync/async api while the other is whether to have an infinite or
 finite stream.  Is it possible to build a finite KafkaStream on a
 range of messages?

 Perhaps a Simple Consumer would do just fine and then I could start
 off getting the writeOffset from zookeeper and tell it to read a
 specified range per partition.  I've done this and forked a simple
 consumer runnable for each partition, for one of our analyzers.  The
 great thing about the high-level consumer is that rebalance, so I can
 fork however many stream readers I want and you just figure it out for
 me

Re: New Consumer API discussion

2014-02-25 Thread Neha Narkhede
 for the specified list of
 topics and partitions to *Kafka*. If no offsets are specified, commits
 offsets returned on the last {@link #poll(long, TimeUnit) poll()} for
 the subscribed list of topics and partitions.

 4. There is inconsistency in specifying partitions. Sometimes we use
 TopicPartition and some other times we use String and int (see
 examples below).

 void onPartitionsAssigned(Consumer consumer,
 TopicPartition...partitions)

 public void *subscribe*(java.lang.String topic, int... partitions)

 Yes, this was discussed previously. I think generally the consensus
 seems to be to use the higher level
 classes everywhere. Made those changes.

 What's the use case of position()? Isn't that just the nextOffset() on
 the
 last message returned from poll()?

 Yes, except in the case where a rebalance is triggered and poll() is not
 yet invoked. Here, you would use position() to get the new fetch position
 for the specific partition. Even if this is not a common use case, IMO it
 is much easier to use position() to get the fetch offset than invoking
 nextOffset() on the last message. This also keeps the APIs symmetric, which
 is nice.




 On Mon, Feb 24, 2014 at 7:06 PM, Withers, Robert 
 robert.with...@dish.com wrote:

 That's wonderful.  Thanks for kafka.

 Rob

 On Feb 24, 2014, at 9:58 AM, Guozhang Wang wangg...@gmail.commailto:
 wangg...@gmail.com wrote:

 Hi Robert,

 Yes, you can check out the callback functions in the new API

 onPartitionDesigned
 onPartitionAssigned

 and see if they meet your needs.

 Guozhang


 On Mon, Feb 24, 2014 at 8:18 AM, Withers, Robert 
 robert.with...@dish.commailto:robert.with...@dish.comwrote:

 Jun,

 Are you saying it is possible to get events from the high-level consumer
 regarding various state machine changes?  For instance, can we get a
 notification when a rebalance starts and ends, when a partition is
 assigned/unassigned, when an offset is committed on a partition, when a
 leader changes and so on?  I call this OOB traffic, since they are not
 the
 core messages streaming, but side-band events, yet they are still
 potentially useful to consumers.

 Thank you,
 Robert


 Robert Withers
 Staff Analyst/Developer
 o: (720) 514-8963
 c:  (571) 262-1873



 -Original Message-
 From: Jun Rao [mailto:jun...@gmail.com]
 Sent: Sunday, February 23, 2014 4:19 PM
 To: users@kafka.apache.orgmailto:users@kafka.apache.org
 Subject: Re: New Consumer API discussion

 Robert,

 For the push orient api, you can potentially implement your own
 MessageHandler with those methods. In the main loop of our new consumer
 api, you can just call those methods based on the events you get.

 Also, we already have an api to get the first and the last offset of a
 partition (getOffsetBefore).

 Thanks,

 Jun


 On Sat, Feb 22, 2014 at 11:29 AM, Withers, Robert
 robert.with...@dish.commailto:robert.with...@dish.comwrote:

 This is a good idea, too.  I would modify it to include stream
 marking, then you can have:

 long end = consumer.lastOffset(tp);
 consumer.setMark(end);
 while(consumer.beforeMark()) {
   process(consumer.pollToMark());
 }

 or

 long end = consumer.lastOffset(tp);
 consumer.setMark(end);
 for(Object msg : consumer.iteratorToMark()) {
   process(msg);
 }

 I actually have 4 suggestions, then:

 *   pull: stream marking
 *   pull: finite streams, bound by time range (up-to-now, yesterday) or
 offset
 *   pull: async api
 *   push: KafkaMessageSource, for a push model, with msg and OOB events.
 Build one in either individual or chunk mode and have a listener for
 each msg or a listener for a chunk of msgs.  Make it composable and
 policy driven (chunked, range, commitOffsets policy, retry policy,
 transactional)

 Thank you,
 Robert

 On Feb 22, 2014, at 11:21 AM, Jay Kreps jay.kr...@gmail.commailto:
 jay.kr...@gmail.commailto:
 jay.kr...@gmail.commailto:jay.kr...@gmail.com wrote:

 I think what Robert is saying is that we need to think through the
 offset API to enable batch processing of topic data. Think of a
 process that periodically kicks off to compute a data summary or do a
 data load or something like that. I think what we need to support this
 is an api to fetch the last offset from the server for a partition.
 Something like
  long lastOffset(TopicPartition tp)
 and for symmetry
  long firstOffset(TopicPartition tp)

 Likely this would have to be batched. Essentially we should add this
 use case to our set of code examples to write and think through.

 The usage would be something like

 long end = consumer.lastOffset(tp);
 while(consumer.position  end)
   process(consumer.poll());

 -Jay


 On Sat, Feb 22, 2014 at 1:52 AM, Withers, Robert
 robert.with...@dish.commailto:robert.with...@dish.com
 mailto:robert.with...@dish.comwrote:

 Jun,

 I was originally thinking a non-blocking read from a distributed
 stream should distinguish between no local messages, but a fetch is
 occurring
 versus you have drained the stream.  The reason this may be valuable
 to me

Re: New Consumer API discussion

2014-02-25 Thread Jun Rao
  Subject: Re: New Consumer API discussion
 
  Robert,
 
  For the push orient api, you can potentially implement your own
  MessageHandler with those methods. In the main loop of our new consumer
  api, you can just call those methods based on the events you get.
 
  Also, we already have an api to get the first and the last offset of a
  partition (getOffsetBefore).
 
  Thanks,
 
  Jun
 
 
  On Sat, Feb 22, 2014 at 11:29 AM, Withers, Robert
  robert.with...@dish.commailto:robert.with...@dish.comwrote:
 
  This is a good idea, too.  I would modify it to include stream
  marking, then you can have:
 
  long end = consumer.lastOffset(tp);
  consumer.setMark(end);
  while(consumer.beforeMark()) {
process(consumer.pollToMark());
  }
 
  or
 
  long end = consumer.lastOffset(tp);
  consumer.setMark(end);
  for(Object msg : consumer.iteratorToMark()) {
process(msg);
  }
 
  I actually have 4 suggestions, then:
 
  *   pull: stream marking
  *   pull: finite streams, bound by time range (up-to-now, yesterday) or
  offset
  *   pull: async api
  *   push: KafkaMessageSource, for a push model, with msg and OOB events.
  Build one in either individual or chunk mode and have a listener for
  each msg or a listener for a chunk of msgs.  Make it composable and
  policy driven (chunked, range, commitOffsets policy, retry policy,
  transactional)
 
  Thank you,
  Robert
 
  On Feb 22, 2014, at 11:21 AM, Jay Kreps jay.kr...@gmail.commailto:
  jay.kr...@gmail.commailto:
  jay.kr...@gmail.commailto:jay.kr...@gmail.com wrote:
 
  I think what Robert is saying is that we need to think through the
  offset API to enable batch processing of topic data. Think of a
  process that periodically kicks off to compute a data summary or do a
  data load or something like that. I think what we need to support this
  is an api to fetch the last offset from the server for a partition.
  Something like
   long lastOffset(TopicPartition tp)
  and for symmetry
   long firstOffset(TopicPartition tp)
 
  Likely this would have to be batched. Essentially we should add this
  use case to our set of code examples to write and think through.
 
  The usage would be something like
 
  long end = consumer.lastOffset(tp);
  while(consumer.position  end)
process(consumer.poll());
 
  -Jay
 
 
  On Sat, Feb 22, 2014 at 1:52 AM, Withers, Robert
  robert.with...@dish.commailto:robert.with...@dish.com
  mailto:robert.with...@dish.comwrote:
 
  Jun,
 
  I was originally thinking a non-blocking read from a distributed
  stream should distinguish between no local messages, but a fetch is
  occurring
  versus you have drained the stream.  The reason this may be valuable
  to me is so I can write consumers that read all known traffic then
  terminate.
  You caused me to reconsider and I think I am conflating 2 things.  One
  is a sync/async api while the other is whether to have an infinite or
  finite stream.  Is it possible to build a finite KafkaStream on a
  range of messages?
 
  Perhaps a Simple Consumer would do just fine and then I could start
  off getting the writeOffset from zookeeper and tell it to read a
  specified range per partition.  I've done this and forked a simple
  consumer runnable for each partition, for one of our analyzers.  The
  great thing about the high-level consumer is that rebalance, so I can
  fork however many stream readers I want and you just figure it out for
  me.  In that way you offer us the control over the resource
  consumption within a pull model.  This is best to regulate message
  pressure, they say.
 
  Combining that high-level rebalance ability with a ranged partition
  drain could be really nice...build the stream with an ending position
  and it is a finite stream, but retain the high-level rebalance.  With
  a finite stream, you would know the difference of the 2 async
  scenarios: fetch-in-progress versus end-of-stream.  With an infinite
  stream, you never get end-of-stream.
 
  Aside from a high-level consumer over a finite range within each
  partition, the other feature I can think of is more complicated.  A
  high-level consumer has state machine changes that the client cannot
  access, to my knowledge.  Our use of kafka has us invoke a message
  handler with each message we consumer from the KafkaStream, so we
  convert a pull-model to a push-model.  Including the idea of receiving
  notifications from state machine changes, what would be really nice is
  to have a KafkaMessageSource, that is an eventful push model.  If it
  were thread-safe, then we could register listeners for various events:
 
  *   opening-stream
  *   closing-stream
  *   message-arrived
  *   end-of-stream/no-more-messages-in-partition (for finite streams)
  *   rebalance started
  *   partition assigned
  *   partition unassigned
  *   rebalance finished
  *   partition-offset-committed
 
  Perhaps that is just our use, but instead of a pull-oriented
  KafkaStream, is there any sense in your providing a push-oriented

Re: New Consumer API discussion

2014-02-25 Thread Jay Kreps
 
  On Feb 24, 2014, at 9:58 AM, Guozhang Wang wangg...@gmail.commailto:
  wangg...@gmail.com wrote:
 
  Hi Robert,
 
  Yes, you can check out the callback functions in the new API
 
  onPartitionDesigned
  onPartitionAssigned
 
  and see if they meet your needs.
 
  Guozhang
 
 
  On Mon, Feb 24, 2014 at 8:18 AM, Withers, Robert 
 robert.with...@dish.com
  mailto:robert.with...@dish.comwrote:
 
  Jun,
 
  Are you saying it is possible to get events from the high-level consumer
  regarding various state machine changes?  For instance, can we get a
  notification when a rebalance starts and ends, when a partition is
  assigned/unassigned, when an offset is committed on a partition, when a
  leader changes and so on?  I call this OOB traffic, since they are not
 the
  core messages streaming, but side-band events, yet they are still
  potentially useful to consumers.
 
  Thank you,
  Robert
 
 
  Robert Withers
  Staff Analyst/Developer
  o: (720) 514-8963
  c:  (571) 262-1873
 
 
 
  -Original Message-
  From: Jun Rao [mailto:jun...@gmail.com]
  Sent: Sunday, February 23, 2014 4:19 PM
  To: users@kafka.apache.orgmailto:users@kafka.apache.org
  Subject: Re: New Consumer API discussion
 
  Robert,
 
  For the push orient api, you can potentially implement your own
  MessageHandler with those methods. In the main loop of our new consumer
  api, you can just call those methods based on the events you get.
 
  Also, we already have an api to get the first and the last offset of a
  partition (getOffsetBefore).
 
  Thanks,
 
  Jun
 
 
  On Sat, Feb 22, 2014 at 11:29 AM, Withers, Robert
  robert.with...@dish.commailto:robert.with...@dish.comwrote:
 
  This is a good idea, too.  I would modify it to include stream
  marking, then you can have:
 
  long end = consumer.lastOffset(tp);
  consumer.setMark(end);
  while(consumer.beforeMark()) {
process(consumer.pollToMark());
  }
 
  or
 
  long end = consumer.lastOffset(tp);
  consumer.setMark(end);
  for(Object msg : consumer.iteratorToMark()) {
process(msg);
  }
 
  I actually have 4 suggestions, then:
 
  *   pull: stream marking
  *   pull: finite streams, bound by time range (up-to-now, yesterday) or
  offset
  *   pull: async api
  *   push: KafkaMessageSource, for a push model, with msg and OOB events.
  Build one in either individual or chunk mode and have a listener for
  each msg or a listener for a chunk of msgs.  Make it composable and
  policy driven (chunked, range, commitOffsets policy, retry policy,
  transactional)
 
  Thank you,
  Robert
 
  On Feb 22, 2014, at 11:21 AM, Jay Kreps jay.kr...@gmail.commailto:
  jay.kr...@gmail.commailto:
  jay.kr...@gmail.commailto:jay.kr...@gmail.com wrote:
 
  I think what Robert is saying is that we need to think through the
  offset API to enable batch processing of topic data. Think of a
  process that periodically kicks off to compute a data summary or do a
  data load or something like that. I think what we need to support this
  is an api to fetch the last offset from the server for a partition.
  Something like
   long lastOffset(TopicPartition tp)
  and for symmetry
   long firstOffset(TopicPartition tp)
 
  Likely this would have to be batched. Essentially we should add this
  use case to our set of code examples to write and think through.
 
  The usage would be something like
 
  long end = consumer.lastOffset(tp);
  while(consumer.position  end)
process(consumer.poll());
 
  -Jay
 
 
  On Sat, Feb 22, 2014 at 1:52 AM, Withers, Robert
  robert.with...@dish.commailto:robert.with...@dish.com
  mailto:robert.with...@dish.comwrote:
 
  Jun,
 
  I was originally thinking a non-blocking read from a distributed
  stream should distinguish between no local messages, but a fetch is
  occurring
  versus you have drained the stream.  The reason this may be valuable
  to me is so I can write consumers that read all known traffic then
  terminate.
  You caused me to reconsider and I think I am conflating 2 things.  One
  is a sync/async api while the other is whether to have an infinite or
  finite stream.  Is it possible to build a finite KafkaStream on a
  range of messages?
 
  Perhaps a Simple Consumer would do just fine and then I could start
  off getting the writeOffset from zookeeper and tell it to read a
  specified range per partition.  I've done this and forked a simple
  consumer runnable for each partition, for one of our analyzers.  The
  great thing about the high-level consumer is that rebalance, so I can
  fork however many stream readers I want and you just figure it out for
  me.  In that way you offer us the control over the resource
  consumption within a pull model.  This is best to regulate message
  pressure, they say.
 
  Combining that high-level rebalance ability with a ranged partition
  drain could be really nice...build the stream with an ending position
  and it is a finite stream, but retain the high-level rebalance.  With
  a finite stream, you would know the difference

Re: New Consumer API discussion

2014-02-25 Thread Jay Kreps
.
 
  The hope is to convey that if no partitions are specified, offsets will
  be committed for the subscribed list of partitions. One improvement
 could
  be to
  explicitly state that the offsets returned on the last poll will be
  committed. I updated this to -
 
  Synchronously commits the specified offsets for the specified list of
  topics and partitions to *Kafka*. If no offsets are specified, commits
  offsets returned on the last {@link #poll(long, TimeUnit) poll()} for
  the subscribed list of topics and partitions.
 
  4. There is inconsistency in specifying partitions. Sometimes we use
  TopicPartition and some other times we use String and int (see
  examples below).
 
  void onPartitionsAssigned(Consumer consumer,
  TopicPartition...partitions)
 
  public void *subscribe*(java.lang.String topic, int... partitions)
 
  Yes, this was discussed previously. I think generally the consensus
  seems to be to use the higher level
  classes everywhere. Made those changes.
 
  What's the use case of position()? Isn't that just the nextOffset() on
  the
  last message returned from poll()?
 
  Yes, except in the case where a rebalance is triggered and poll() is
 not
  yet invoked. Here, you would use position() to get the new fetch
 position
  for the specific partition. Even if this is not a common use case, IMO
 it
  is much easier to use position() to get the fetch offset than invoking
  nextOffset() on the last message. This also keeps the APIs symmetric,
 which
  is nice.
 
 
 
 
  On Mon, Feb 24, 2014 at 7:06 PM, Withers, Robert 
  robert.with...@dish.com wrote:
 
  That's wonderful.  Thanks for kafka.
 
  Rob
 
  On Feb 24, 2014, at 9:58 AM, Guozhang Wang wangg...@gmail.com
 mailto:
  wangg...@gmail.com wrote:
 
  Hi Robert,
 
  Yes, you can check out the callback functions in the new API
 
  onPartitionDesigned
  onPartitionAssigned
 
  and see if they meet your needs.
 
  Guozhang
 
 
  On Mon, Feb 24, 2014 at 8:18 AM, Withers, Robert 
  robert.with...@dish.commailto:robert.with...@dish.comwrote:
 
  Jun,
 
  Are you saying it is possible to get events from the high-level
 consumer
  regarding various state machine changes?  For instance, can we get a
  notification when a rebalance starts and ends, when a partition is
  assigned/unassigned, when an offset is committed on a partition, when
 a
  leader changes and so on?  I call this OOB traffic, since they are not
  the
  core messages streaming, but side-band events, yet they are still
  potentially useful to consumers.
 
  Thank you,
  Robert
 
 
  Robert Withers
  Staff Analyst/Developer
  o: (720) 514-8963
  c:  (571) 262-1873
 
 
 
  -Original Message-
  From: Jun Rao [mailto:jun...@gmail.com]
  Sent: Sunday, February 23, 2014 4:19 PM
  To: users@kafka.apache.orgmailto:users@kafka.apache.org
  Subject: Re: New Consumer API discussion
 
  Robert,
 
  For the push orient api, you can potentially implement your own
  MessageHandler with those methods. In the main loop of our new
 consumer
  api, you can just call those methods based on the events you get.
 
  Also, we already have an api to get the first and the last offset of a
  partition (getOffsetBefore).
 
  Thanks,
 
  Jun
 
 
  On Sat, Feb 22, 2014 at 11:29 AM, Withers, Robert
  robert.with...@dish.commailto:robert.with...@dish.comwrote:
 
  This is a good idea, too.  I would modify it to include stream
  marking, then you can have:
 
  long end = consumer.lastOffset(tp);
  consumer.setMark(end);
  while(consumer.beforeMark()) {
process(consumer.pollToMark());
  }
 
  or
 
  long end = consumer.lastOffset(tp);
  consumer.setMark(end);
  for(Object msg : consumer.iteratorToMark()) {
process(msg);
  }
 
  I actually have 4 suggestions, then:
 
  *   pull: stream marking
  *   pull: finite streams, bound by time range (up-to-now, yesterday)
 or
  offset
  *   pull: async api
  *   push: KafkaMessageSource, for a push model, with msg and OOB
 events.
  Build one in either individual or chunk mode and have a listener for
  each msg or a listener for a chunk of msgs.  Make it composable and
  policy driven (chunked, range, commitOffsets policy, retry policy,
  transactional)
 
  Thank you,
  Robert
 
  On Feb 22, 2014, at 11:21 AM, Jay Kreps jay.kr...@gmail.commailto:
  jay.kr...@gmail.commailto:
  jay.kr...@gmail.commailto:jay.kr...@gmail.com wrote:
 
  I think what Robert is saying is that we need to think through the
  offset API to enable batch processing of topic data. Think of a
  process that periodically kicks off to compute a data summary or do a
  data load or something like that. I think what we need to support this
  is an api to fetch the last offset from the server for a partition.
  Something like
   long lastOffset(TopicPartition tp)
  and for symmetry
   long firstOffset(TopicPartition tp)
 
  Likely this would have to be batched. Essentially we should add this
  use case to our set of code examples to write and think through.
 
  The usage would be something like

Re: New Consumer API discussion

2014-02-25 Thread Neha Narkhede
How do you know n? The whole point is that you need to be able to fetch the
end offset. You can't a priori decide you will load 1m messages without
knowing what is there.

Hmm. I think what you are pointing out is that in the new consumer API, we
don't have a way to issue the equivalent of the existing getOffsetsBefore()
API. Agree that is a flaw that we should fix.

Will update the docs/wiki with a few use cases that I've collected so far
and see if the API covers those.

I would prefer PartitionsAssigned and PartitionsRevoked as that seems
clearer to me

Well the RebalanceBeginCallback interface will have onPartitionsAssigned()
as the callback. Similarly, the RebalanceEndCallback interface will have
onPartitionsRevoked() as the callback. Makes sense?

Thanks,
Neha


On Tue, Feb 25, 2014 at 2:38 PM, Jay Kreps jay.kr...@gmail.com wrote:

 1. I would prefer PartitionsAssigned and PartitionsRevoked as that seems
 clearer to me.

 -Jay


 On Tue, Feb 25, 2014 at 10:19 AM, Neha Narkhede neha.narkh...@gmail.com
 wrote:

  Thanks for the reviews so far! There are a few outstanding questions -
 
  1.  It will be good to make the rebalance callbacks forward compatible
 with
  Java 8 capabilities. We can change it to PartitionsAssignedCallback
  and PartitionsRevokedCallback or RebalanceBeginCallback and
  RebalanceEndCallback?
 
  If there are no objections, I will change it to RebalanceBeginCallback
 and
  RebalanceEndCallback.
 
  2.  The return type for committed() is ListTopicPartitionOffset. There
  was a suggestion to change it to either be MapTopicPartition,Long or
  MapTopicPartition, TopicPartitionOffset
 
  Do people have feedback on this suggestion?
 
 
  On Tue, Feb 25, 2014 at 9:56 AM, Neha Narkhede neha.narkh...@gmail.com
  wrote:
 
   Robert,
  
   Are you saying it is possible to get events from the high-level
  consumerregarding various state machine changes?  For instance, can we
 get a
   notification when a rebalance starts and ends, when a partition is
   assigned/unassigned, when an offset is committed on a partition, when a
   leader changes and so on?  I call this OOB traffic, since they are not
  the
   core messages streaming, but side-band events, yet they are still
   potentially useful to consumers.
  
   In the current proposal, you get notified when the state machine
 changes
   i.e. before and after a rebalance is triggered. Look at
   ConsumerRebalanceCallback
 
 http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html
  
   .Leader changes do not count as state machine changes for consumer
   rebalance purposes.
  
   Thanks,
   Neha
  
  
   On Tue, Feb 25, 2014 at 9:54 AM, Neha Narkhede 
 neha.narkh...@gmail.com
  wrote:
  
   Jay/Robert -
  
  
   I think what Robert is saying is that we need to think through the
  offset
   API to enable batch processing of topic data. Think of a process
 that
   periodically kicks off to compute a data summary or do a data load or
   something like that. I think what we need to support this is an api to
   fetch the last offset from the server for a partition. Something like
  long lastOffset(TopicPartition tp)
   and for symmetry
  long firstOffset(TopicPartition tp)
  
   Likely this would have to be batched.
  
   A fixed range of data load can be done using the existing APIs as
   follows. This assumes you know the endOffset which can be
 currentOffset
  + n
   (number of messages in the load)
  
   long startOffset = consumer.position(partition);
   long endOffset = startOffset + n;
   while(consumer.position(partition) = endOffset) {
ListConsumerRecord messages = consumer.poll(timeout,
   TimeUnit.MILLISECONDS);
process(messages, endOffset);  // processes messages
 until
   endOffset
   }
  
   Does that make sense?
  
  
   On Tue, Feb 25, 2014 at 9:49 AM, Neha Narkhede 
 neha.narkh...@gmail.com
  wrote:
  
   Thanks for the review, Jun. Here are some comments -
  
  
   1. The using of ellipsis: This may make passing a list of items from
 a
   collection to the api a bit harder. Suppose that you have a list of
   topics
   stored in
  
   ArrayListString topics;
  
   If you want subscribe to all topics in one call, you will have to do:
  
   String[] topicArray = new String[topics.size()];
   consumer.subscribe(topics.
   toArray(topicArray));
  
   A similar argument can be made for arguably the more common use case
 of
   subscribing to a single topic as well. In these cases, user is
 required
   to write more
   code to create a single item collection and pass it in. Since
   subscription is extremely lightweight
   invoking it multiple times also seems like a workable solution, no?
  
   2. It would be good to document that the following apis are mutually
   exclusive. Also, if the partition level subscription is specified,
  there
   is
   no group management. Finally, unsubscribe() can only be used to
 cancel
   subscriptions

  1   2   >