Re: Load Balancing Kafka
I think the answer here is that the Kafka protocol includes a broker metadata api. The client uses the broker host(s) you provide to discover the full list of brokers in the cluster (and the topics+partitions each manages/leads). The java client has a similar interface via metadata.brokers.list / bootstrap.servers. -Dana AhŠ It seems you are more focusing on producer side workload balanceŠ If that is the case, please ignore my previous comments. Jiangjie (Becket) Qin On 7/15/15, 6:01 PM, Jiangjie Qin j...@linkedin.com wrote: If you have pretty balanced traffic on each partition and have set auto.leader.rebalance.enabled to true or false, you might not need to do further workload balance. However, in most cases you probably still need to do some sort of load balancing based on the traffic and disk utilization of each broker. You might want to do leader migration and/or partition reassignment. Leader migration is a cheaper rebalance and mostly addresses CPU and Network unbalance. Partition reassignment is a much more expensive operation as it moves actual data, this can help with disk utilization in addition to CPU and network. Thanks, Jiangjie (Becket) Qin On 7/15/15, 5:19 PM, Sandy Waters sandy.watermell...@gmail.com wrote: Hi all, Do I need to load balance against the brokers? I am using the python driver and it seems to only want a single kafka broker host. However, in a situation where I have 10 brokers, is it still fine to just give it one host. Does zookeeper and kafka handle the load balancing and redirect my push somewhere else? Would it hurt if I load balanced with Nginx and had it do round robin to the brokers? Much thanks for any help. -Sandy
Re: Load Balancing Kafka
Greetings Sandy, Folks smarter than me can correct me if I am wrong. Using Python client you don't have to connect to Zookeeper, so just specifying one of the brokers should be sufficient. In terms of what happens to your messages as your client produces them, they should be randomly assigned to a partition of the topic you specify, lest you use keyed messages, that will send a messages to a particular partition based on the key: http://kafka.apache.org/documentation.html#theproducer How to actually do that process, of relating keys messages have to particular partitions is beyond my realm of knowledge. I suspect the concern is flooding one broker with messages, while the others are underutilized. I believe Kafka's architecture ensures that will while only one broker will be the leader for a particular partition, and take writes for that partition, other brokers that are not leader for a particular partition will eventually be in-sync with the leader for a particular partition. So, I don't think you need to worry about sending your messages to VIP and having to direct where messages end up with manual load-balancing, even if your messages are assigned to a partition randomly. hth! *Terry Bates* *Email: *terryjba...@gmail.com *Phone: (*412) 215-0881 *Skype*: terryjbates *GitHub*: https://github.com/terryjbates *Linkedin*: http://www.linkedin.com/in/terryjbates/ On Wed, Jul 15, 2015 at 5:19 PM, Sandy Waters sandy.watermell...@gmail.com wrote: Hi all, Do I need to load balance against the brokers? I am using the python driver and it seems to only want a single kafka broker host. However, in a situation where I have 10 brokers, is it still fine to just give it one host. Does zookeeper and kafka handle the load balancing and redirect my push somewhere else? Would it hurt if I load balanced with Nginx and had it do round robin to the brokers? Much thanks for any help. -Sandy
Re: Load Balancing Kafka
If you have pretty balanced traffic on each partition and have set auto.leader.rebalance.enabled to true or false, you might not need to do further workload balance. However, in most cases you probably still need to do some sort of load balancing based on the traffic and disk utilization of each broker. You might want to do leader migration and/or partition reassignment. Leader migration is a cheaper rebalance and mostly addresses CPU and Network unbalance. Partition reassignment is a much more expensive operation as it moves actual data, this can help with disk utilization in addition to CPU and network. Thanks, Jiangjie (Becket) Qin On 7/15/15, 5:19 PM, Sandy Waters sandy.watermell...@gmail.com wrote: Hi all, Do I need to load balance against the brokers? I am using the python driver and it seems to only want a single kafka broker host. However, in a situation where I have 10 brokers, is it still fine to just give it one host. Does zookeeper and kafka handle the load balancing and redirect my push somewhere else? Would it hurt if I load balanced with Nginx and had it do round robin to the brokers? Much thanks for any help. -Sandy
Re: Load Balancing Kafka
AhŠ It seems you are more focusing on producer side workload balanceŠ If that is the case, please ignore my previous comments. Jiangjie (Becket) Qin On 7/15/15, 6:01 PM, Jiangjie Qin j...@linkedin.com wrote: If you have pretty balanced traffic on each partition and have set auto.leader.rebalance.enabled to true or false, you might not need to do further workload balance. However, in most cases you probably still need to do some sort of load balancing based on the traffic and disk utilization of each broker. You might want to do leader migration and/or partition reassignment. Leader migration is a cheaper rebalance and mostly addresses CPU and Network unbalance. Partition reassignment is a much more expensive operation as it moves actual data, this can help with disk utilization in addition to CPU and network. Thanks, Jiangjie (Becket) Qin On 7/15/15, 5:19 PM, Sandy Waters sandy.watermell...@gmail.com wrote: Hi all, Do I need to load balance against the brokers? I am using the python driver and it seems to only want a single kafka broker host. However, in a situation where I have 10 brokers, is it still fine to just give it one host. Does zookeeper and kafka handle the load balancing and redirect my push somewhere else? Would it hurt if I load balanced with Nginx and had it do round robin to the brokers? Much thanks for any help. -Sandy
Load Balancing Kafka
Hi all, Do I need to load balance against the brokers? I am using the python driver and it seems to only want a single kafka broker host. However, in a situation where I have 10 brokers, is it still fine to just give it one host. Does zookeeper and kafka handle the load balancing and redirect my push somewhere else? Would it hurt if I load balanced with Nginx and had it do round robin to the brokers? Much thanks for any help. -Sandy