Re: quick question about new consumer api

2014-07-07 Thread Guozhang Wang
We plan to have a working prototype ready end of September.

Guozhang


On Mon, Jul 7, 2014 at 11:05 AM, Jason Rosenberg  wrote:

> Great, that's reassuring!
>
> What's the time frame for having a more or less stable version to try out?
>
> Jason
>
>
> On Mon, Jul 7, 2014 at 12:59 PM, Guozhang Wang  wrote:
>
> > I see your point now. The old consumer does have a hard-coded
> > "round-robin-per-topic" logic which have this issue. In the new consumer,
> > we will make the assignment logic customizable so that people can specify
> > different rebalance algorithms they like.
> >
> > Also I will soon send out a new consumer design summary email for more
> > comments. Feel free to give us more thoughts you have about the new
> > consumer design.
> >
> > Guozhang
> >
> >
> > On Mon, Jul 7, 2014 at 8:44 AM, Jason Rosenberg 
> wrote:
> >
> > > Guozhang,
> > >
> > > I'm not suggesting we parallelize within a partition
> > >
> > > The problem with the current high-level consumer is, if you use a regex
> > to
> > > select multiple topics, and then have multiple consumers in the same
> > group,
> > > usually the first consumer will 'own' all the topics, and no amount of
> > > sub-sequent rebalancing will allow other consumers in the group to own
> > some
> > > of the topics.  Re-balancing does allow other consumers to own multiple
> > > partitions, but if a topic has only 1 partition, only the first
> consumer
> > to
> > > initialize will get all the work.
> > >
> > > So, I'm wondering if the new api will be better about re-balancing the
> > work
> > > at the partition level, and not the topic level, as such.
> > >
> > > Jason
> > >
> > >
> > > On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang 
> > wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > In the new design the consumption is still at the per-partition
> > > > granularity. The main rationale of doing this is ordering: Within a
> > > > partition we want to preserve the ordering such that message B
> produced
> > > > after message A will also be consumed and processed after message A.
> > And
> > > > producers can use keys to make sure messages with the same ordering
> > group
> > > > will be in the same partition. To do this we have to make one
> partition
> > > > only being consumed by a single client at a time. On the other hand,
> > when
> > > > one wants to add the number of consumers beyond the number of
> > partitions,
> > > > he can always use the topic tool to dynamically add more partitions
> to
> > > the
> > > > topic.
> > > >
> > > > Do you have a specific scenario in mind that would require
> > > single-partition
> > > > topics?
> > > >
> > > > Guozhang
> > > >
> > > >
> > > >
> > > > On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg 
> > > wrote:
> > > >
> > > > > I've been looking at the new consumer api outlined here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
> > > > >
> > > > > One issue in the current high-level consumer, is that it does not
> do
> > a
> > > > good
> > > > > job of distributing a set of topics between multiple consumers,
> > unless
> > > > each
> > > > > topic has multiple partitions.  This has always seemed strange to
> me,
> > > > since
> > > > > at the end of the day, even for single partition topics, the basic
> > unit
> > > > of
> > > > > consumption is still at the partition level (so you'd expect
> > > rebalancing
> > > > to
> > > > > try to evenly distribute partitions (regardless of the topic)).
> > > > >
> > > > > It's not clearly spelled out in the new consumer api wiki, so I'll
> > just
> > > > > ask, will this issue be addressed in the new api?  I think I've
> asked
> > > > this
> > > > > before, but I wanted to go check again, and am not seeing this
> > > explicitly
> > > > > addressed in the design.
> > > > >
> > > > > Thanks
> > > > >
> > > > > Jason
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: quick question about new consumer api

2014-07-07 Thread Jason Rosenberg
Great, that's reassuring!

What's the time frame for having a more or less stable version to try out?

Jason


On Mon, Jul 7, 2014 at 12:59 PM, Guozhang Wang  wrote:

> I see your point now. The old consumer does have a hard-coded
> "round-robin-per-topic" logic which have this issue. In the new consumer,
> we will make the assignment logic customizable so that people can specify
> different rebalance algorithms they like.
>
> Also I will soon send out a new consumer design summary email for more
> comments. Feel free to give us more thoughts you have about the new
> consumer design.
>
> Guozhang
>
>
> On Mon, Jul 7, 2014 at 8:44 AM, Jason Rosenberg  wrote:
>
> > Guozhang,
> >
> > I'm not suggesting we parallelize within a partition
> >
> > The problem with the current high-level consumer is, if you use a regex
> to
> > select multiple topics, and then have multiple consumers in the same
> group,
> > usually the first consumer will 'own' all the topics, and no amount of
> > sub-sequent rebalancing will allow other consumers in the group to own
> some
> > of the topics.  Re-balancing does allow other consumers to own multiple
> > partitions, but if a topic has only 1 partition, only the first consumer
> to
> > initialize will get all the work.
> >
> > So, I'm wondering if the new api will be better about re-balancing the
> work
> > at the partition level, and not the topic level, as such.
> >
> > Jason
> >
> >
> > On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang 
> wrote:
> >
> > > Hi Jason,
> > >
> > > In the new design the consumption is still at the per-partition
> > > granularity. The main rationale of doing this is ordering: Within a
> > > partition we want to preserve the ordering such that message B produced
> > > after message A will also be consumed and processed after message A.
> And
> > > producers can use keys to make sure messages with the same ordering
> group
> > > will be in the same partition. To do this we have to make one partition
> > > only being consumed by a single client at a time. On the other hand,
> when
> > > one wants to add the number of consumers beyond the number of
> partitions,
> > > he can always use the topic tool to dynamically add more partitions to
> > the
> > > topic.
> > >
> > > Do you have a specific scenario in mind that would require
> > single-partition
> > > topics?
> > >
> > > Guozhang
> > >
> > >
> > >
> > > On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg 
> > wrote:
> > >
> > > > I've been looking at the new consumer api outlined here:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
> > > >
> > > > One issue in the current high-level consumer, is that it does not do
> a
> > > good
> > > > job of distributing a set of topics between multiple consumers,
> unless
> > > each
> > > > topic has multiple partitions.  This has always seemed strange to me,
> > > since
> > > > at the end of the day, even for single partition topics, the basic
> unit
> > > of
> > > > consumption is still at the partition level (so you'd expect
> > rebalancing
> > > to
> > > > try to evenly distribute partitions (regardless of the topic)).
> > > >
> > > > It's not clearly spelled out in the new consumer api wiki, so I'll
> just
> > > > ask, will this issue be addressed in the new api?  I think I've asked
> > > this
> > > > before, but I wanted to go check again, and am not seeing this
> > explicitly
> > > > addressed in the design.
> > > >
> > > > Thanks
> > > >
> > > > Jason
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: quick question about new consumer api

2014-07-07 Thread Guozhang Wang
I see your point now. The old consumer does have a hard-coded
"round-robin-per-topic" logic which have this issue. In the new consumer,
we will make the assignment logic customizable so that people can specify
different rebalance algorithms they like.

Also I will soon send out a new consumer design summary email for more
comments. Feel free to give us more thoughts you have about the new
consumer design.

Guozhang


On Mon, Jul 7, 2014 at 8:44 AM, Jason Rosenberg  wrote:

> Guozhang,
>
> I'm not suggesting we parallelize within a partition
>
> The problem with the current high-level consumer is, if you use a regex to
> select multiple topics, and then have multiple consumers in the same group,
> usually the first consumer will 'own' all the topics, and no amount of
> sub-sequent rebalancing will allow other consumers in the group to own some
> of the topics.  Re-balancing does allow other consumers to own multiple
> partitions, but if a topic has only 1 partition, only the first consumer to
> initialize will get all the work.
>
> So, I'm wondering if the new api will be better about re-balancing the work
> at the partition level, and not the topic level, as such.
>
> Jason
>
>
> On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang  wrote:
>
> > Hi Jason,
> >
> > In the new design the consumption is still at the per-partition
> > granularity. The main rationale of doing this is ordering: Within a
> > partition we want to preserve the ordering such that message B produced
> > after message A will also be consumed and processed after message A. And
> > producers can use keys to make sure messages with the same ordering group
> > will be in the same partition. To do this we have to make one partition
> > only being consumed by a single client at a time. On the other hand, when
> > one wants to add the number of consumers beyond the number of partitions,
> > he can always use the topic tool to dynamically add more partitions to
> the
> > topic.
> >
> > Do you have a specific scenario in mind that would require
> single-partition
> > topics?
> >
> > Guozhang
> >
> >
> >
> > On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg 
> wrote:
> >
> > > I've been looking at the new consumer api outlined here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
> > >
> > > One issue in the current high-level consumer, is that it does not do a
> > good
> > > job of distributing a set of topics between multiple consumers, unless
> > each
> > > topic has multiple partitions.  This has always seemed strange to me,
> > since
> > > at the end of the day, even for single partition topics, the basic unit
> > of
> > > consumption is still at the partition level (so you'd expect
> rebalancing
> > to
> > > try to evenly distribute partitions (regardless of the topic)).
> > >
> > > It's not clearly spelled out in the new consumer api wiki, so I'll just
> > > ask, will this issue be addressed in the new api?  I think I've asked
> > this
> > > before, but I wanted to go check again, and am not seeing this
> explicitly
> > > addressed in the design.
> > >
> > > Thanks
> > >
> > > Jason
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: quick question about new consumer api

2014-07-07 Thread Jason Rosenberg
Guozhang,

I'm not suggesting we parallelize within a partition

The problem with the current high-level consumer is, if you use a regex to
select multiple topics, and then have multiple consumers in the same group,
usually the first consumer will 'own' all the topics, and no amount of
sub-sequent rebalancing will allow other consumers in the group to own some
of the topics.  Re-balancing does allow other consumers to own multiple
partitions, but if a topic has only 1 partition, only the first consumer to
initialize will get all the work.

So, I'm wondering if the new api will be better about re-balancing the work
at the partition level, and not the topic level, as such.

Jason


On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang  wrote:

> Hi Jason,
>
> In the new design the consumption is still at the per-partition
> granularity. The main rationale of doing this is ordering: Within a
> partition we want to preserve the ordering such that message B produced
> after message A will also be consumed and processed after message A. And
> producers can use keys to make sure messages with the same ordering group
> will be in the same partition. To do this we have to make one partition
> only being consumed by a single client at a time. On the other hand, when
> one wants to add the number of consumers beyond the number of partitions,
> he can always use the topic tool to dynamically add more partitions to the
> topic.
>
> Do you have a specific scenario in mind that would require single-partition
> topics?
>
> Guozhang
>
>
>
> On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg  wrote:
>
> > I've been looking at the new consumer api outlined here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
> >
> > One issue in the current high-level consumer, is that it does not do a
> good
> > job of distributing a set of topics between multiple consumers, unless
> each
> > topic has multiple partitions.  This has always seemed strange to me,
> since
> > at the end of the day, even for single partition topics, the basic unit
> of
> > consumption is still at the partition level (so you'd expect rebalancing
> to
> > try to evenly distribute partitions (regardless of the topic)).
> >
> > It's not clearly spelled out in the new consumer api wiki, so I'll just
> > ask, will this issue be addressed in the new api?  I think I've asked
> this
> > before, but I wanted to go check again, and am not seeing this explicitly
> > addressed in the design.
> >
> > Thanks
> >
> > Jason
> >
>
>
>
> --
> -- Guozhang
>


Re: quick question about new consumer api

2014-07-07 Thread Guozhang Wang
Hi Jason,

In the new design the consumption is still at the per-partition
granularity. The main rationale of doing this is ordering: Within a
partition we want to preserve the ordering such that message B produced
after message A will also be consumed and processed after message A. And
producers can use keys to make sure messages with the same ordering group
will be in the same partition. To do this we have to make one partition
only being consumed by a single client at a time. On the other hand, when
one wants to add the number of consumers beyond the number of partitions,
he can always use the topic tool to dynamically add more partitions to the
topic.

Do you have a specific scenario in mind that would require single-partition
topics?

Guozhang



On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg  wrote:

> I've been looking at the new consumer api outlined here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
>
> One issue in the current high-level consumer, is that it does not do a good
> job of distributing a set of topics between multiple consumers, unless each
> topic has multiple partitions.  This has always seemed strange to me, since
> at the end of the day, even for single partition topics, the basic unit of
> consumption is still at the partition level (so you'd expect rebalancing to
> try to evenly distribute partitions (regardless of the topic)).
>
> It's not clearly spelled out in the new consumer api wiki, so I'll just
> ask, will this issue be addressed in the new api?  I think I've asked this
> before, but I wanted to go check again, and am not seeing this explicitly
> addressed in the design.
>
> Thanks
>
> Jason
>



-- 
-- Guozhang


quick question about new consumer api

2014-07-07 Thread Jason Rosenberg
I've been looking at the new consumer api outlined here:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design

One issue in the current high-level consumer, is that it does not do a good
job of distributing a set of topics between multiple consumers, unless each
topic has multiple partitions.  This has always seemed strange to me, since
at the end of the day, even for single partition topics, the basic unit of
consumption is still at the partition level (so you'd expect rebalancing to
try to evenly distribute partitions (regardless of the topic)).

It's not clearly spelled out in the new consumer api wiki, so I'll just
ask, will this issue be addressed in the new api?  I think I've asked this
before, but I wanted to go check again, and am not seeing this explicitly
addressed in the design.

Thanks

Jason