Re: Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Yifan Ying
Hi Brian,

We have 5 brokers and ~80 topics. And the total # of partitions is around
7k partitions if not including replicas (So it's close to the limit that
Netflix recommends). Most topics have RF as 2. CPU is only around 25%
usage. The average consumers for each topic should be around 3-4. Our disk
space is the current bottleneck as we have some topics producing relatively
large messages, so we have to lower retention for some topics to only 1
hour. When adding our 5th broker, we had trouble to migrate
__consumer_offsets topic because of
https://issues.apache.org/jira/browse/KAFKA-4362. So __consumer_offsets
have to live in the first 4 brokers even we keep adding brokers.

We want to add a new cluster for some specific group of topics which serves
large messages and needs a much longer retention. This is also to reduce
operational complexity. I am willing to get any suggestions on scaling the
current cluster, but also curious to learn how people do topic discovery.

On Tue, Dec 6, 2016 at 12:37 PM, Brian Krahmer  wrote:

> You didn't mention anything about your current configuration, just that
> you are 'out of resources'.  Perhaps you misunderstand how to size your
> partitions per topic, and how partition allocation works.  If your brokers
> are maxed on cpu, and you double the number of brokers but keep the replica
> count the same, I would expect cpu usage to nearly get cut in half.  How
> many brokers do you have, how many topics do you have and how many
> partitions per topic do you have?  What is your resource utilization for
> bandwidth, CPU, and memory?  How many average consumers do you have for
> each topic?
>
> brian
>
>
>
> On 06.12.2016 21:23, Yifan Ying wrote:
>
>> Hi Aseem, the concern is to create too many partitions in total in one
>> cluster no matter how many brokers I have in this cluster. I think the two
>> articles that I mentioned explain why too many partitions in one cluster
>> could cause issues.
>>
>>
>>
>


-- 
Yifan


Re: Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Brian Krahmer
You didn't mention anything about your current configuration, just that 
you are 'out of resources'.  Perhaps you misunderstand how to size your 
partitions per topic, and how partition allocation works.  If your 
brokers are maxed on cpu, and you double the number of brokers but keep 
the replica count the same, I would expect cpu usage to nearly get cut 
in half.  How many brokers do you have, how many topics do you have and 
how many partitions per topic do you have?  What is your resource 
utilization for bandwidth, CPU, and memory?  How many average consumers 
do you have for each topic?


brian


On 06.12.2016 21:23, Yifan Ying wrote:

Hi Aseem, the concern is to create too many partitions in total in one
cluster no matter how many brokers I have in this cluster. I think the two
articles that I mentioned explain why too many partitions in one cluster
could cause issues.






Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Yifan Ying
Hi Aseem, the concern is to create too many partitions in total in one
cluster no matter how many brokers I have in this cluster. I think the two
articles that I mentioned explain why too many partitions in one cluster
could cause issues.

On Tue, Dec 6, 2016 at 12:08 PM, Aseem Bansal  wrote:

> @Yifan Ying Why not add more brokers in your cluster? That will not
> increase the partitions. Does increasing the number of brokers cause you
> any problem? How many brokers do you have in the cluster already?
>
> On Wed, Dec 7, 2016 at 12:35 AM, Yifan Ying  wrote:
>
> > Thanks Asaf, Aseem.
> >
> > Assigning topics to only a specific set of brokers will probably cause
> > uneven traffic and it won't prevent topics to be re-assigned to other
> > brokers when brokers fail.
> >
> > Like I said, the original cluster is close to out of resources. I
> remember
> > there's some limit on # of partitions that each Kafka cluster can have.
> > Netflix recommends to keep it below 10k to improve availability and
> reduce
> > latency,
> > http://techblog.netflix.com/2016/04/kafka-inside-keystone-pipeline.html.
> > Jun Rao also wrote a blog(
> > https://www.confluent.io/blog/how-to-choose-the-number-of-
> > topicspartitions-in-a-kafka-cluster/)
> > about how too many partitions could hurt availability and latency. That's
> > why we want to create another cluster instead of expanding the current
> one.
> > I know a lot of companies are maintaining multiple clusters, and I'm
> > curious how people are doing topic discovery.
> >
> >
> >
> > On Tue, Dec 6, 2016 at 4:04 AM, Aseem Bansal 
> wrote:
> >
> > > What configurations allow you to assign topics to specific brokers?
> > >
> > > I can see https://kafka.apache.org/documentation#basic_ops_automigrate
> .
> > > This should allow you to move things around but does that keep anything
> > > from being re-assigned to the old ones?
> > >
> > > On Tue, Dec 6, 2016 at 5:25 PM, Asaf Mesika 
> > wrote:
> > >
> > > > Why not re-use same cluster? You can assign topics to be live only
> > > within a
> > > > specific set of brokers. Thus you have one "bus" for messages,
> > > simplifying
> > > > your applications code and configurations
> > > >
> > > > On Mon, Dec 5, 2016 at 9:43 PM Yifan Ying 
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Initially, we have only one Kafka cluster shared across all teams.
> > But
> > > > now
> > > > > this cluster is very close to out of resources (disk space, # of
> > > > > partitions, etc.). So we are considering adding another Kafka
> > cluster.
> > > > But
> > > > > what's the best practice of topic discovery, so that applications
> > know
> > > > > which cluster their topics live? We have been using Zookeeper for
> > > service
> > > > > discovery, maybe it's also good for this purpose?
> > > > >
> > > > > Thanks
> > > > >
> > > > > --
> > > > > Yifan
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Yifan
> >
>



-- 
Yifan


Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Aseem Bansal
@Yifan Ying Why not add more brokers in your cluster? That will not
increase the partitions. Does increasing the number of brokers cause you
any problem? How many brokers do you have in the cluster already?

On Wed, Dec 7, 2016 at 12:35 AM, Yifan Ying  wrote:

> Thanks Asaf, Aseem.
>
> Assigning topics to only a specific set of brokers will probably cause
> uneven traffic and it won't prevent topics to be re-assigned to other
> brokers when brokers fail.
>
> Like I said, the original cluster is close to out of resources. I remember
> there's some limit on # of partitions that each Kafka cluster can have.
> Netflix recommends to keep it below 10k to improve availability and reduce
> latency,
> http://techblog.netflix.com/2016/04/kafka-inside-keystone-pipeline.html.
> Jun Rao also wrote a blog(
> https://www.confluent.io/blog/how-to-choose-the-number-of-
> topicspartitions-in-a-kafka-cluster/)
> about how too many partitions could hurt availability and latency. That's
> why we want to create another cluster instead of expanding the current one.
> I know a lot of companies are maintaining multiple clusters, and I'm
> curious how people are doing topic discovery.
>
>
>
> On Tue, Dec 6, 2016 at 4:04 AM, Aseem Bansal  wrote:
>
> > What configurations allow you to assign topics to specific brokers?
> >
> > I can see https://kafka.apache.org/documentation#basic_ops_automigrate.
> > This should allow you to move things around but does that keep anything
> > from being re-assigned to the old ones?
> >
> > On Tue, Dec 6, 2016 at 5:25 PM, Asaf Mesika 
> wrote:
> >
> > > Why not re-use same cluster? You can assign topics to be live only
> > within a
> > > specific set of brokers. Thus you have one "bus" for messages,
> > simplifying
> > > your applications code and configurations
> > >
> > > On Mon, Dec 5, 2016 at 9:43 PM Yifan Ying  wrote:
> > >
> > > > Hi,
> > > >
> > > > Initially, we have only one Kafka cluster shared across all teams.
> But
> > > now
> > > > this cluster is very close to out of resources (disk space, # of
> > > > partitions, etc.). So we are considering adding another Kafka
> cluster.
> > > But
> > > > what's the best practice of topic discovery, so that applications
> know
> > > > which cluster their topics live? We have been using Zookeeper for
> > service
> > > > discovery, maybe it's also good for this purpose?
> > > >
> > > > Thanks
> > > >
> > > > --
> > > > Yifan
> > > >
> > >
> >
>
>
>
> --
> Yifan
>


Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Yifan Ying
Thanks Asaf, Aseem.

Assigning topics to only a specific set of brokers will probably cause
uneven traffic and it won't prevent topics to be re-assigned to other
brokers when brokers fail.

Like I said, the original cluster is close to out of resources. I remember
there's some limit on # of partitions that each Kafka cluster can have.
Netflix recommends to keep it below 10k to improve availability and reduce
latency,
http://techblog.netflix.com/2016/04/kafka-inside-keystone-pipeline.html.
Jun Rao also wrote a blog(
https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/)
about how too many partitions could hurt availability and latency. That's
why we want to create another cluster instead of expanding the current one.
I know a lot of companies are maintaining multiple clusters, and I'm
curious how people are doing topic discovery.



On Tue, Dec 6, 2016 at 4:04 AM, Aseem Bansal  wrote:

> What configurations allow you to assign topics to specific brokers?
>
> I can see https://kafka.apache.org/documentation#basic_ops_automigrate.
> This should allow you to move things around but does that keep anything
> from being re-assigned to the old ones?
>
> On Tue, Dec 6, 2016 at 5:25 PM, Asaf Mesika  wrote:
>
> > Why not re-use same cluster? You can assign topics to be live only
> within a
> > specific set of brokers. Thus you have one "bus" for messages,
> simplifying
> > your applications code and configurations
> >
> > On Mon, Dec 5, 2016 at 9:43 PM Yifan Ying  wrote:
> >
> > > Hi,
> > >
> > > Initially, we have only one Kafka cluster shared across all teams. But
> > now
> > > this cluster is very close to out of resources (disk space, # of
> > > partitions, etc.). So we are considering adding another Kafka cluster.
> > But
> > > what's the best practice of topic discovery, so that applications know
> > > which cluster their topics live? We have been using Zookeeper for
> service
> > > discovery, maybe it's also good for this purpose?
> > >
> > > Thanks
> > >
> > > --
> > > Yifan
> > >
> >
>



-- 
Yifan


Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Aseem Bansal
What configurations allow you to assign topics to specific brokers?

I can see https://kafka.apache.org/documentation#basic_ops_automigrate.
This should allow you to move things around but does that keep anything
from being re-assigned to the old ones?

On Tue, Dec 6, 2016 at 5:25 PM, Asaf Mesika  wrote:

> Why not re-use same cluster? You can assign topics to be live only within a
> specific set of brokers. Thus you have one "bus" for messages, simplifying
> your applications code and configurations
>
> On Mon, Dec 5, 2016 at 9:43 PM Yifan Ying  wrote:
>
> > Hi,
> >
> > Initially, we have only one Kafka cluster shared across all teams. But
> now
> > this cluster is very close to out of resources (disk space, # of
> > partitions, etc.). So we are considering adding another Kafka cluster.
> But
> > what's the best practice of topic discovery, so that applications know
> > which cluster their topics live? We have been using Zookeeper for service
> > discovery, maybe it's also good for this purpose?
> >
> > Thanks
> >
> > --
> > Yifan
> >
>


Re: Topic discovery when supporting multiple kafka clusters

2016-12-06 Thread Asaf Mesika
Why not re-use same cluster? You can assign topics to be live only within a
specific set of brokers. Thus you have one "bus" for messages, simplifying
your applications code and configurations

On Mon, Dec 5, 2016 at 9:43 PM Yifan Ying  wrote:

> Hi,
>
> Initially, we have only one Kafka cluster shared across all teams. But now
> this cluster is very close to out of resources (disk space, # of
> partitions, etc.). So we are considering adding another Kafka cluster. But
> what's the best practice of topic discovery, so that applications know
> which cluster their topics live? We have been using Zookeeper for service
> discovery, maybe it's also good for this purpose?
>
> Thanks
>
> --
> Yifan
>