Re: Distribute data from Kafka evenly on cluster

2014-07-18 Thread Chen Song
Speaking of this, I have another related question. In my spark streaming job, I set up multiple consumers to receive data from Kafka, with each worker from one partition. Initially, Spark is intelligent enough to associate each worker to each partition, to make data consumption distributed.

Re: Distribute data from Kafka evenly on cluster

2014-07-18 Thread Tobias Pfeiffer
Hi, as far as I know, rebalance is triggered from Kafka in order to distribute partitions evenly. That is, to achieve the opposite of what you are seeing. I think it would be interesting to check the Kafka logs for the result of the rebalance operation and why you see what you are seeing. I know

Re: Distribute data from Kafka evenly on cluster

2014-07-04 Thread Tobias Pfeiffer
Hi, unfortunately, when I go the above approach, I run into this problem: http://mail-archives.apache.org/mod_mbox/kafka-users/201401.mbox/%3ccabtfevyxvtaqvnmvwmh7yscfgxpw5kmrnw_gnq72cy4oa1b...@mail.gmail.com%3E That is, a NoNode error in Zookeeper when rebalancing. The Kafka receiver will retry

Re: Distribute data from Kafka evenly on cluster

2014-06-28 Thread Mayur Rustagi
how abou this? https://groups.google.com/forum/#!topic/spark-users/ntPQUZFJt4M Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Sat, Jun 28, 2014 at 10:19 AM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, I have a

Distribute data from Kafka evenly on cluster

2014-06-27 Thread Tobias Pfeiffer
Hi, I have a number of questions using the Kafka receiver of Spark Streaming. Maybe someone has some more experience with that and can help me out. I have set up an environment for getting to know Spark, consisting of - a Mesos cluster with 3 only-slaves and 3 master-and-slaves, - 2 Kafka nodes,