Re: about consumer group name

2014-03-17 Thread Churu Tang
Hi Guozhang,

I really appreciate the detailed explanation!
Does this mean that the rule “1 partition will be consumed by exactly one 
consumer in the consumer group” will only be checked and ensured at the 
consumer side? Also, does the “consumer group registration in Zookeeper” need 
to be taken care by the consumer group itself?
I think made a mistake in “there is no consumer group name included in any kind 
of request” because the clientId acts as the logical grouping”. Sorry about 
that.

Thanks!

Churu

On Mar 14, 2014, at 5:21 PM, Guozhang Wang wangg...@gmail.com wrote:

 Hi Churu,
 
 Brokers are actually not aware of the consumer groups, the consumer groups
 is maintained within the consumers (and registered in ZK) to achieve load
 balance. After the group has decided who-consume-which-partition, then each
 consumer will do their fetching independently and send fetching requests to
 the brokers. Brokers on the other hand will just blindly respond to the
 fetch requests.
 
 Hope this helps.
 
 Guozhang
 
 
 On Fri, Mar 14, 2014 at 4:50 PM, Churu Tang ct...@rubiconproject.comwrote:
 
 Hi,
 
 Consumers label themselves with a consumer group name, and consumer
 group name should be global across each Kafka cluster. However, when I
 check the API, there is no consumer group name included in any kind of
 request(metadata, produce, fetch, offset). Does the broker know about the
 consumer group name?
 
 Thanks for your time!
 
 Cheers,
 churu
 
 
 
 
 -- 
 -- Guozhang



Wirbelsturm released, 1-click deployments of Kafka clusters

2014-03-17 Thread Michael G. Noll
Hi everyone,

I have released a tool called Wirbelsturm
(https://github.com/miguno/wirbelsturm) that allows you to perform local
and remote deployments of Kafka.  It's also a small way of saying a big
thank you to the Kafka community.

Wirbelsturm uses Vagrant for creating and managing machines, and Puppet
for provisioning the machines once they're up and running.  You can also
use Ansible to interact with deployed machines.  Deploying Kafka is but
one example, of course -- you can deploy other software with Wirbelsturm
as well (e.g. Graphite, Redis, Storm, ZooKeeper).

I also wrote a quick intro and behind-the-scenes blog post at [1], which
covers, for instance, the motivation behind building Wirbelsturm and
lessons learned along the way (read: mistakes made :-P).

Enjoy!
Michael


[1]
http://www.michael-noll.com/blog/2014/03/17/wirbelsturm-one-click-deploy-storm-kafka-clusters-with-vagrant-puppet/





signature.asc
Description: OpenPGP digital signature


Documentation for metadata.broker.list

2014-03-17 Thread Ryan Berdeen
I recently moved my 0.8.0 cluster to a set of entirely new brokers, and was
surprised to find that the producers did not update their list of brokers
to remove the brokers that were no longer in the cluster. That is, I had
brokers 1,2,3,4,5, added brokers 6,7,8,910, waited a day, and stopped
brokers 1,2,3,4,5. After stopping the original brokers, the producers
continued trying to fetch metadata from them, even though they no longer
appeared the topic metadata. It looks like it was basically the same issue
described in
http://mail-archives.apache.org/mod_mbox/kafka-users/201402.mbox/%3ccadpdzrksmjcpr0-l9txxoegthbjtptamh_fupq3ranwytkp...@mail.gmail.com%3E
.

The documentation for metadata.broker.list says This is for
bootstrapping and the producer will only use it for getting metadata. I,
and apparently others, have interpreted the this is for bootstrapping
part to mean that it is only used for the initial metadata fetch, and not
as the sole list of brokers to use to fetch metadata.

Does it make sense to change the documentation to something like The list
of brokers used to fetch metadata. These brokers must always be available?

Am I understanding metadata.broker.list correctly? Is it necessary to load
balance these brokers or otherwise make sure it does not refer directly to
any brokers that could be removed?

Thanks,

Ryan


Re: Request Kafka / ZooKeeper JMS details

2014-03-17 Thread Steven A Robenalt
Hi Arun,

Kafka doesn't provide JMS services. It has some architectural similarities
with JMS, though Kafka is designed to scale to much larger systems than
JMS, but if your app uses JMS, it will need to be modified to use Kafka
Producers and Consumers in order to work.

Steve



On Mon, Mar 17, 2014 at 1:51 PM, Arun Gujjar arungujjart...@yahoo.comwrote:

 Hi

 I am new to Kafka and ZooKeeper, could you please help with answering the
 below questions..

 We have both Kafka and Zookeeper in installed in DEV and we are
 successfully able to publish and consume messages, the question I have is,

 1. Can you tell me how I can make sure JMS config is enabled?
 2. What will be the port number, userId and password for JMS messages so
 that I can login from console.
 3. Where can I find the details to call JMS API from internal application.

 Your help will be really appreciated.
 Regards,
 Arun



 On Monday, March 17, 2014 1:36 PM, Arun Gujjar arungujjart...@yahoo.com
 wrote:

 Hi

 I am new to Kafka and ZooKeeper, could you please help with answering the
 below questions..

 We have both Kafka and Zookeeper in installed in DEV and we are
 successfully able to publish and consume messages, the question I have is,

 1. Can you tell me how I can make sure JMS config is enabled?
 2. What will be the port number, userId and password for JMS messages so
 that I can login from console.
 3. Where can I find the details to call JMS API from internal application.

 Your help will be really appreciated.
 Regards,
 Arun




-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobe...@stanford.edu
http://highwire.stanford.edu


queuing support and javadocs (newbie)

2014-03-17 Thread ashili
We are trying to adapt kafka for our project and would appreciate comments on 
my following questions-thank you

1)where is javadoc for kafka 0.81? This helps me look at availaible API 
support. Googling (nor kafka wiki) didnt take me to javadoc
2)One of the use cases we are looking at is queuing support where system 
A(producer) and system B (consumer) would exchange messages under 
intermittent/constrained/limited network availaibility. From kafka wiki, I 
understand it supports batching and compression. My specific questions: i)will 
producer (A) be still be able to write to queue without throwing exceptions 
when network connection is down or should I handle the network exception b)can 
some one confirm that the queue is alive and messages on queue are persisted 
when network connection doesnt exist; c)I assume consumer (B) will be able to 
retreieve messages if above two (i and ii) conditions are met


Re: about consumer group name

2014-03-17 Thread Guozhang Wang
The short answer to both questions is yes.

For the second question, the long answer will be: on startup of the
high-level consumer, it will register itself under its ZK group path, so it
knows other members in the group also; and upon rebalance triggered, it
will read its peer number, and also the number of partitions, and make a
decision who-consume-which. This decision is done independently on each
consumer and is supposed to result in the same decision across the
consumers since the decision algorithm is deterministic.

In the coming Kafka 0.9, this logic will be moved to the brokers so
consumer clients will be thinner: no ZK dependency and no coordination
logic at all.

Guozhang






On Mon, Mar 17, 2014 at 11:43 AM, Churu Tang ct...@rubiconproject.comwrote:

 Hi Guozhang,

 I really appreciate the detailed explanation!
 Does this mean that the rule 1 partition will be consumed by exactly one
 consumer in the consumer group will only be checked and ensured at the
 consumer side? Also, does the consumer group registration in Zookeeper
 need to be taken care by the consumer group itself?
 I think made a mistake in there is no consumer group name included in any
 kind of request because the clientId acts as the logical grouping. Sorry
 about that.

 Thanks!

 Churu

 On Mar 14, 2014, at 5:21 PM, Guozhang Wang wangg...@gmail.com wrote:

  Hi Churu,
 
  Brokers are actually not aware of the consumer groups, the consumer
 groups
  is maintained within the consumers (and registered in ZK) to achieve load
  balance. After the group has decided who-consume-which-partition, then
 each
  consumer will do their fetching independently and send fetching requests
 to
  the brokers. Brokers on the other hand will just blindly respond to the
  fetch requests.
 
  Hope this helps.
 
  Guozhang
 
 
  On Fri, Mar 14, 2014 at 4:50 PM, Churu Tang ct...@rubiconproject.com
 wrote:
 
  Hi,
 
  Consumers label themselves with a consumer group name, and consumer
  group name should be global across each Kafka cluster. However, when I
  check the API, there is no consumer group name included in any kind of
  request(metadata, produce, fetch, offset). Does the broker know about
 the
  consumer group name?
 
  Thanks for your time!
 
  Cheers,
  churu
 
 
 
 
  --
  -- Guozhang




-- 
-- Guozhang


Re: New Consumer API discussion

2014-03-17 Thread Neha Narkhede
I'm not quite sure if I fully understood your question. The consumer API
exposes a close() method that will shutdown the consumer's connections to
all brokers and frees up resources that the consumer uses.

I've updated the javadoc for the new consumer API to include a few examples
of different ways of using the consumer. Probably you might find it useful
-
http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html

Thanks,
Neha


On Sun, Mar 16, 2014 at 7:55 PM, Shanmugam, Srividhya 
srividhyashanmu...@fico.com wrote:

 Can the consumer API provide a way to shut down the connector by doing a
 look up by the consumer group Id? For example, application may be consuming
 the messages in one thread whereas the shutdown call can  be initiated in a
 different thread.

 This email and any files transmitted with it are confidential, proprietary
 and intended solely for the individual or entity to whom they are
 addressed. If you have received this email in error please delete it
 immediately.



Re: Documentation for metadata.broker.list

2014-03-17 Thread Jay Kreps
Currently metadata.broker.list is a separate list, unrelated to the
cluster, which is used for all metadata requests.

We are actually just now discussing whether this makes sense or not with
respect to the new producer and consumer we are working on. I actually
misunderstood how the existing producer worked so in the producer rewrite I
made it work the way you describe. We are currently trying to figure out
which of these is better.

-Jay


On Mon, Mar 17, 2014 at 1:29 PM, Ryan Berdeen rberd...@hubspot.com wrote:

 I recently moved my 0.8.0 cluster to a set of entirely new brokers, and was
 surprised to find that the producers did not update their list of brokers
 to remove the brokers that were no longer in the cluster. That is, I had
 brokers 1,2,3,4,5, added brokers 6,7,8,910, waited a day, and stopped
 brokers 1,2,3,4,5. After stopping the original brokers, the producers
 continued trying to fetch metadata from them, even though they no longer
 appeared the topic metadata. It looks like it was basically the same issue
 described in

 http://mail-archives.apache.org/mod_mbox/kafka-users/201402.mbox/%3ccadpdzrksmjcpr0-l9txxoegthbjtptamh_fupq3ranwytkp...@mail.gmail.com%3E
 .

 The documentation for metadata.broker.list says This is for
 bootstrapping and the producer will only use it for getting metadata. I,
 and apparently others, have interpreted the this is for bootstrapping
 part to mean that it is only used for the initial metadata fetch, and not
 as the sole list of brokers to use to fetch metadata.

 Does it make sense to change the documentation to something like The list
 of brokers used to fetch metadata. These brokers must always be available?

 Am I understanding metadata.broker.list correctly? Is it necessary to load
 balance these brokers or otherwise make sure it does not refer directly to
 any brokers that could be removed?

 Thanks,

 Ryan



Re: about consumer group name

2014-03-17 Thread Churu Tang
Thanks Guozhang!

Cheers,
Churu

On Mar 17, 2014, at 3:48 PM, Guozhang Wang wangg...@gmail.com wrote:

 The short answer to both questions is yes.
 
 For the second question, the long answer will be: on startup of the
 high-level consumer, it will register itself under its ZK group path, so it
 knows other members in the group also; and upon rebalance triggered, it
 will read its peer number, and also the number of partitions, and make a
 decision who-consume-which. This decision is done independently on each
 consumer and is supposed to result in the same decision across the
 consumers since the decision algorithm is deterministic.
 
 In the coming Kafka 0.9, this logic will be moved to the brokers so
 consumer clients will be thinner: no ZK dependency and no coordination
 logic at all.
 
 Guozhang
 
 
 
 
 
 
 On Mon, Mar 17, 2014 at 11:43 AM, Churu Tang ct...@rubiconproject.comwrote:
 
 Hi Guozhang,
 
 I really appreciate the detailed explanation!
 Does this mean that the rule 1 partition will be consumed by exactly one
 consumer in the consumer group will only be checked and ensured at the
 consumer side? Also, does the consumer group registration in Zookeeper
 need to be taken care by the consumer group itself?
 I think made a mistake in there is no consumer group name included in any
 kind of request because the clientId acts as the logical grouping. Sorry
 about that.
 
 Thanks!
 
 Churu
 
 On Mar 14, 2014, at 5:21 PM, Guozhang Wang wangg...@gmail.com wrote:
 
 Hi Churu,
 
 Brokers are actually not aware of the consumer groups, the consumer
 groups
 is maintained within the consumers (and registered in ZK) to achieve load
 balance. After the group has decided who-consume-which-partition, then
 each
 consumer will do their fetching independently and send fetching requests
 to
 the brokers. Brokers on the other hand will just blindly respond to the
 fetch requests.
 
 Hope this helps.
 
 Guozhang
 
 
 On Fri, Mar 14, 2014 at 4:50 PM, Churu Tang ct...@rubiconproject.com
 wrote:
 
 Hi,
 
 Consumers label themselves with a consumer group name, and consumer
 group name should be global across each Kafka cluster. However, when I
 check the API, there is no consumer group name included in any kind of
 request(metadata, produce, fetch, offset). Does the broker know about
 the
 consumer group name?
 
 Thanks for your time!
 
 Cheers,
 churu
 
 
 
 
 --
 -- Guozhang
 
 
 
 
 -- 
 -- Guozhang



Re: Hardware planning

2014-03-17 Thread Carlile, Ken
OK, I understand. So for the Zookeeper cluster, can I go with something like: 

3 x Dell R320: 
Single hexcore 2.5GHz Xeon, 32GB RAM, 4x10K 300GB SAS drives, 10GbE

and if I do, can I drop the CPU specs on the broker machines to say, dual 6 
cores? Or are we looking at something that is core bound here? 

Thanks, 
Ken

On Mar 15, 2014, at 11:09 AM, Ray Rodriguez rayrod2...@gmail.com wrote:

 Imagine a situation where one of your nodes running a kafka broker and
 zookeeper node goes down.  You now have to contend with two distributed
 systems that need to do leader election and consensus in the case of a
 zookeeper ensemble and partition rebalancing/repair in the case of a kafka
 cluster so I think Jun's point is that when running distributed systems try
 to isolate them as much as possible from running on the same node to
 achieve better fault tolerance and high availability.
 
 From the Kafka docs you can see that a zookeeper cluster does't need to sit
 on very powerful hardware to be reliable so I believe the suggestion is to
 run a small independent zookeeper cluster that will be used by kafka and by
 all means don't hesitate to reuse that zookeeper ensemble for other systems
 as long as you can guarantee that all the systems using the zk ensemble use
 some form of znode root to keep their data seperated within the zookeeper
 znode directory structure.
 
 This is an interesting topic and I'd love to hear if anyone else is running
 their zk alongside their kafka brokers in production?
 
 Ray
 
 
 On Sat, Mar 15, 2014 at 10:28 AM, Carlile, Ken 
 carli...@janelia.hhmi.orgwrote:
 
 I'd rather not purchase dedicated hardware for ZK if I don't absolutely
 have to, unless I can use it for multiple clusters (ie Kafka, HBase, other
 things that rely on ZK). Would adding more cores help with ZK on the same
 machine? Or is that just a waste of cores, considering that it's java under
 all of this?
 
 --Ken
 
 On Mar 15, 2014, at 12:07 AM, Jun Rao jun...@gmail.com wrote:
 
 The spec looks reasonable. If you have other machines, it may be better
 to
 put ZK on its own machines.
 
 Thanks,
 
 Jun
 
 
 On Fri, Mar 14, 2014 at 10:52 AM, Carlile, Ken 
 carli...@janelia.hhmi.orgwrote:
 
 Hi all,
 
 I'm looking at setting up a (small) Kafka cluster for streaming
 microscope
 data to Spark-Streaming.
 
 The producer would be a single Windows 7 machine with a 1Gb or 10Gb
 ethernet connection running http posts from Matlab (this bit is a little
 fuzzy, and I'm not the user, I'm an admin), the consumer would be 10-60
 (or
 more) Linux nodes running Spark-Streaming with 10Gb ethernet
 connections.
 Target data rate per the user is 200MB/sec, although I can see this
 scaling in the future.
 
 Based on the documentation, my initial thoughts were as follows:
 
 3 nodes, all running ZK and the broker
 
 Dell R620
 2x8 core 2.6GHz Xeon
 256GB RAM
 8x300GB 15K SAS drives (OS runs on 2, ZK on 1, broker on the last 5)
 10Gb ethernet (single port)
 
 Do these specs make sense? Am I over or under-speccing in any of the
 areas? It made sense to me to make the filesystem cache as large as
 possible, particularly when I'm dealing with a small number of brokers.
 
 Thanks,
 Ken Carlile
 Senior Unix Engineer, Scientific Computing Systems
 Janelia Farm Research Campus, HHMI
 
 
 



Producer load balancing

2014-03-17 Thread Abhinav Anand
Hi,
 I am using the kafka producer 0.8. Each producer seem to be sending
messages only to a specific broker until metadata refresh. Also I find each
producer thread connected to only one broker at once.

I had read that producer send messages in round robin fashion. Is there
some specific configuration to enable round robin message delivery ?

-- 
Abhinav Anand


Re: Producer load balancing

2014-03-17 Thread Neha Narkhede
The behavior that you described is explained here -
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
?

Thanks,
Neha


On Mon, Mar 17, 2014 at 6:26 PM, Abhinav Anand ab.rv...@gmail.com wrote:

 Hi,
  I am using the kafka producer 0.8. Each producer seem to be sending
 messages only to a specific broker until metadata refresh. Also I find each
 producer thread connected to only one broker at once.

 I had read that producer send messages in round robin fashion. Is there
 some specific configuration to enable round robin message delivery ?

 --
 Abhinav Anand