Re: about consumer group name
Hi Guozhang, I really appreciate the detailed explanation! Does this mean that the rule “1 partition will be consumed by exactly one consumer in the consumer group” will only be checked and ensured at the consumer side? Also, does the “consumer group registration in Zookeeper” need to be taken care by the consumer group itself? I think made a mistake in “there is no consumer group name included in any kind of request” because the clientId acts as the logical grouping”. Sorry about that. Thanks! Churu On Mar 14, 2014, at 5:21 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Churu, Brokers are actually not aware of the consumer groups, the consumer groups is maintained within the consumers (and registered in ZK) to achieve load balance. After the group has decided who-consume-which-partition, then each consumer will do their fetching independently and send fetching requests to the brokers. Brokers on the other hand will just blindly respond to the fetch requests. Hope this helps. Guozhang On Fri, Mar 14, 2014 at 4:50 PM, Churu Tang ct...@rubiconproject.comwrote: Hi, Consumers label themselves with a consumer group name, and consumer group name should be global across each Kafka cluster. However, when I check the API, there is no consumer group name included in any kind of request(metadata, produce, fetch, offset). Does the broker know about the consumer group name? Thanks for your time! Cheers, churu -- -- Guozhang
Wirbelsturm released, 1-click deployments of Kafka clusters
Hi everyone, I have released a tool called Wirbelsturm (https://github.com/miguno/wirbelsturm) that allows you to perform local and remote deployments of Kafka. It's also a small way of saying a big thank you to the Kafka community. Wirbelsturm uses Vagrant for creating and managing machines, and Puppet for provisioning the machines once they're up and running. You can also use Ansible to interact with deployed machines. Deploying Kafka is but one example, of course -- you can deploy other software with Wirbelsturm as well (e.g. Graphite, Redis, Storm, ZooKeeper). I also wrote a quick intro and behind-the-scenes blog post at [1], which covers, for instance, the motivation behind building Wirbelsturm and lessons learned along the way (read: mistakes made :-P). Enjoy! Michael [1] http://www.michael-noll.com/blog/2014/03/17/wirbelsturm-one-click-deploy-storm-kafka-clusters-with-vagrant-puppet/ signature.asc Description: OpenPGP digital signature
Documentation for metadata.broker.list
I recently moved my 0.8.0 cluster to a set of entirely new brokers, and was surprised to find that the producers did not update their list of brokers to remove the brokers that were no longer in the cluster. That is, I had brokers 1,2,3,4,5, added brokers 6,7,8,910, waited a day, and stopped brokers 1,2,3,4,5. After stopping the original brokers, the producers continued trying to fetch metadata from them, even though they no longer appeared the topic metadata. It looks like it was basically the same issue described in http://mail-archives.apache.org/mod_mbox/kafka-users/201402.mbox/%3ccadpdzrksmjcpr0-l9txxoegthbjtptamh_fupq3ranwytkp...@mail.gmail.com%3E . The documentation for metadata.broker.list says This is for bootstrapping and the producer will only use it for getting metadata. I, and apparently others, have interpreted the this is for bootstrapping part to mean that it is only used for the initial metadata fetch, and not as the sole list of brokers to use to fetch metadata. Does it make sense to change the documentation to something like The list of brokers used to fetch metadata. These brokers must always be available? Am I understanding metadata.broker.list correctly? Is it necessary to load balance these brokers or otherwise make sure it does not refer directly to any brokers that could be removed? Thanks, Ryan
Re: Request Kafka / ZooKeeper JMS details
Hi Arun, Kafka doesn't provide JMS services. It has some architectural similarities with JMS, though Kafka is designed to scale to much larger systems than JMS, but if your app uses JMS, it will need to be modified to use Kafka Producers and Consumers in order to work. Steve On Mon, Mar 17, 2014 at 1:51 PM, Arun Gujjar arungujjart...@yahoo.comwrote: Hi I am new to Kafka and ZooKeeper, could you please help with answering the below questions.. We have both Kafka and Zookeeper in installed in DEV and we are successfully able to publish and consume messages, the question I have is, 1. Can you tell me how I can make sure JMS config is enabled? 2. What will be the port number, userId and password for JMS messages so that I can login from console. 3. Where can I find the details to call JMS API from internal application. Your help will be really appreciated. Regards, Arun On Monday, March 17, 2014 1:36 PM, Arun Gujjar arungujjart...@yahoo.com wrote: Hi I am new to Kafka and ZooKeeper, could you please help with answering the below questions.. We have both Kafka and Zookeeper in installed in DEV and we are successfully able to publish and consume messages, the question I have is, 1. Can you tell me how I can make sure JMS config is enabled? 2. What will be the port number, userId and password for JMS messages so that I can login from console. 3. Where can I find the details to call JMS API from internal application. Your help will be really appreciated. Regards, Arun -- Steve Robenalt Software Architect HighWire | Stanford University 425 Broadway St, Redwood City, CA 94063 srobe...@stanford.edu http://highwire.stanford.edu
queuing support and javadocs (newbie)
We are trying to adapt kafka for our project and would appreciate comments on my following questions-thank you 1)where is javadoc for kafka 0.81? This helps me look at availaible API support. Googling (nor kafka wiki) didnt take me to javadoc 2)One of the use cases we are looking at is queuing support where system A(producer) and system B (consumer) would exchange messages under intermittent/constrained/limited network availaibility. From kafka wiki, I understand it supports batching and compression. My specific questions: i)will producer (A) be still be able to write to queue without throwing exceptions when network connection is down or should I handle the network exception b)can some one confirm that the queue is alive and messages on queue are persisted when network connection doesnt exist; c)I assume consumer (B) will be able to retreieve messages if above two (i and ii) conditions are met
Re: about consumer group name
The short answer to both questions is yes. For the second question, the long answer will be: on startup of the high-level consumer, it will register itself under its ZK group path, so it knows other members in the group also; and upon rebalance triggered, it will read its peer number, and also the number of partitions, and make a decision who-consume-which. This decision is done independently on each consumer and is supposed to result in the same decision across the consumers since the decision algorithm is deterministic. In the coming Kafka 0.9, this logic will be moved to the brokers so consumer clients will be thinner: no ZK dependency and no coordination logic at all. Guozhang On Mon, Mar 17, 2014 at 11:43 AM, Churu Tang ct...@rubiconproject.comwrote: Hi Guozhang, I really appreciate the detailed explanation! Does this mean that the rule 1 partition will be consumed by exactly one consumer in the consumer group will only be checked and ensured at the consumer side? Also, does the consumer group registration in Zookeeper need to be taken care by the consumer group itself? I think made a mistake in there is no consumer group name included in any kind of request because the clientId acts as the logical grouping. Sorry about that. Thanks! Churu On Mar 14, 2014, at 5:21 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Churu, Brokers are actually not aware of the consumer groups, the consumer groups is maintained within the consumers (and registered in ZK) to achieve load balance. After the group has decided who-consume-which-partition, then each consumer will do their fetching independently and send fetching requests to the brokers. Brokers on the other hand will just blindly respond to the fetch requests. Hope this helps. Guozhang On Fri, Mar 14, 2014 at 4:50 PM, Churu Tang ct...@rubiconproject.com wrote: Hi, Consumers label themselves with a consumer group name, and consumer group name should be global across each Kafka cluster. However, when I check the API, there is no consumer group name included in any kind of request(metadata, produce, fetch, offset). Does the broker know about the consumer group name? Thanks for your time! Cheers, churu -- -- Guozhang -- -- Guozhang
Re: New Consumer API discussion
I'm not quite sure if I fully understood your question. The consumer API exposes a close() method that will shutdown the consumer's connections to all brokers and frees up resources that the consumer uses. I've updated the javadoc for the new consumer API to include a few examples of different ways of using the consumer. Probably you might find it useful - http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html Thanks, Neha On Sun, Mar 16, 2014 at 7:55 PM, Shanmugam, Srividhya srividhyashanmu...@fico.com wrote: Can the consumer API provide a way to shut down the connector by doing a look up by the consumer group Id? For example, application may be consuming the messages in one thread whereas the shutdown call can be initiated in a different thread. This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.
Re: Documentation for metadata.broker.list
Currently metadata.broker.list is a separate list, unrelated to the cluster, which is used for all metadata requests. We are actually just now discussing whether this makes sense or not with respect to the new producer and consumer we are working on. I actually misunderstood how the existing producer worked so in the producer rewrite I made it work the way you describe. We are currently trying to figure out which of these is better. -Jay On Mon, Mar 17, 2014 at 1:29 PM, Ryan Berdeen rberd...@hubspot.com wrote: I recently moved my 0.8.0 cluster to a set of entirely new brokers, and was surprised to find that the producers did not update their list of brokers to remove the brokers that were no longer in the cluster. That is, I had brokers 1,2,3,4,5, added brokers 6,7,8,910, waited a day, and stopped brokers 1,2,3,4,5. After stopping the original brokers, the producers continued trying to fetch metadata from them, even though they no longer appeared the topic metadata. It looks like it was basically the same issue described in http://mail-archives.apache.org/mod_mbox/kafka-users/201402.mbox/%3ccadpdzrksmjcpr0-l9txxoegthbjtptamh_fupq3ranwytkp...@mail.gmail.com%3E . The documentation for metadata.broker.list says This is for bootstrapping and the producer will only use it for getting metadata. I, and apparently others, have interpreted the this is for bootstrapping part to mean that it is only used for the initial metadata fetch, and not as the sole list of brokers to use to fetch metadata. Does it make sense to change the documentation to something like The list of brokers used to fetch metadata. These brokers must always be available? Am I understanding metadata.broker.list correctly? Is it necessary to load balance these brokers or otherwise make sure it does not refer directly to any brokers that could be removed? Thanks, Ryan
Re: about consumer group name
Thanks Guozhang! Cheers, Churu On Mar 17, 2014, at 3:48 PM, Guozhang Wang wangg...@gmail.com wrote: The short answer to both questions is yes. For the second question, the long answer will be: on startup of the high-level consumer, it will register itself under its ZK group path, so it knows other members in the group also; and upon rebalance triggered, it will read its peer number, and also the number of partitions, and make a decision who-consume-which. This decision is done independently on each consumer and is supposed to result in the same decision across the consumers since the decision algorithm is deterministic. In the coming Kafka 0.9, this logic will be moved to the brokers so consumer clients will be thinner: no ZK dependency and no coordination logic at all. Guozhang On Mon, Mar 17, 2014 at 11:43 AM, Churu Tang ct...@rubiconproject.comwrote: Hi Guozhang, I really appreciate the detailed explanation! Does this mean that the rule 1 partition will be consumed by exactly one consumer in the consumer group will only be checked and ensured at the consumer side? Also, does the consumer group registration in Zookeeper need to be taken care by the consumer group itself? I think made a mistake in there is no consumer group name included in any kind of request because the clientId acts as the logical grouping. Sorry about that. Thanks! Churu On Mar 14, 2014, at 5:21 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Churu, Brokers are actually not aware of the consumer groups, the consumer groups is maintained within the consumers (and registered in ZK) to achieve load balance. After the group has decided who-consume-which-partition, then each consumer will do their fetching independently and send fetching requests to the brokers. Brokers on the other hand will just blindly respond to the fetch requests. Hope this helps. Guozhang On Fri, Mar 14, 2014 at 4:50 PM, Churu Tang ct...@rubiconproject.com wrote: Hi, Consumers label themselves with a consumer group name, and consumer group name should be global across each Kafka cluster. However, when I check the API, there is no consumer group name included in any kind of request(metadata, produce, fetch, offset). Does the broker know about the consumer group name? Thanks for your time! Cheers, churu -- -- Guozhang -- -- Guozhang
Re: Hardware planning
OK, I understand. So for the Zookeeper cluster, can I go with something like: 3 x Dell R320: Single hexcore 2.5GHz Xeon, 32GB RAM, 4x10K 300GB SAS drives, 10GbE and if I do, can I drop the CPU specs on the broker machines to say, dual 6 cores? Or are we looking at something that is core bound here? Thanks, Ken On Mar 15, 2014, at 11:09 AM, Ray Rodriguez rayrod2...@gmail.com wrote: Imagine a situation where one of your nodes running a kafka broker and zookeeper node goes down. You now have to contend with two distributed systems that need to do leader election and consensus in the case of a zookeeper ensemble and partition rebalancing/repair in the case of a kafka cluster so I think Jun's point is that when running distributed systems try to isolate them as much as possible from running on the same node to achieve better fault tolerance and high availability. From the Kafka docs you can see that a zookeeper cluster does't need to sit on very powerful hardware to be reliable so I believe the suggestion is to run a small independent zookeeper cluster that will be used by kafka and by all means don't hesitate to reuse that zookeeper ensemble for other systems as long as you can guarantee that all the systems using the zk ensemble use some form of znode root to keep their data seperated within the zookeeper znode directory structure. This is an interesting topic and I'd love to hear if anyone else is running their zk alongside their kafka brokers in production? Ray On Sat, Mar 15, 2014 at 10:28 AM, Carlile, Ken carli...@janelia.hhmi.orgwrote: I'd rather not purchase dedicated hardware for ZK if I don't absolutely have to, unless I can use it for multiple clusters (ie Kafka, HBase, other things that rely on ZK). Would adding more cores help with ZK on the same machine? Or is that just a waste of cores, considering that it's java under all of this? --Ken On Mar 15, 2014, at 12:07 AM, Jun Rao jun...@gmail.com wrote: The spec looks reasonable. If you have other machines, it may be better to put ZK on its own machines. Thanks, Jun On Fri, Mar 14, 2014 at 10:52 AM, Carlile, Ken carli...@janelia.hhmi.orgwrote: Hi all, I'm looking at setting up a (small) Kafka cluster for streaming microscope data to Spark-Streaming. The producer would be a single Windows 7 machine with a 1Gb or 10Gb ethernet connection running http posts from Matlab (this bit is a little fuzzy, and I'm not the user, I'm an admin), the consumer would be 10-60 (or more) Linux nodes running Spark-Streaming with 10Gb ethernet connections. Target data rate per the user is 200MB/sec, although I can see this scaling in the future. Based on the documentation, my initial thoughts were as follows: 3 nodes, all running ZK and the broker Dell R620 2x8 core 2.6GHz Xeon 256GB RAM 8x300GB 15K SAS drives (OS runs on 2, ZK on 1, broker on the last 5) 10Gb ethernet (single port) Do these specs make sense? Am I over or under-speccing in any of the areas? It made sense to me to make the filesystem cache as large as possible, particularly when I'm dealing with a small number of brokers. Thanks, Ken Carlile Senior Unix Engineer, Scientific Computing Systems Janelia Farm Research Campus, HHMI
Producer load balancing
Hi, I am using the kafka producer 0.8. Each producer seem to be sending messages only to a specific broker until metadata refresh. Also I find each producer thread connected to only one broker at once. I had read that producer send messages in round robin fashion. Is there some specific configuration to enable round robin message delivery ? -- Abhinav Anand
Re: Producer load balancing
The behavior that you described is explained here - https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified ? Thanks, Neha On Mon, Mar 17, 2014 at 6:26 PM, Abhinav Anand ab.rv...@gmail.com wrote: Hi, I am using the kafka producer 0.8. Each producer seem to be sending messages only to a specific broker until metadata refresh. Also I find each producer thread connected to only one broker at once. I had read that producer send messages in round robin fashion. Is there some specific configuration to enable round robin message delivery ? -- Abhinav Anand