Re: Distribute data from Kafka evenly on cluster

Tobias Pfeiffer Fri, 04 Jul 2014 08:12:06 -0700

Hi,

unfortunately, when I go the above approach, I run into this problem:

http://mail-archives.apache.org/mod_mbox/kafka-users/201401.mbox/%3ccabtfevyxvtaqvnmvwmh7yscfgxpw5kmrnw_gnq72cy4oa1b...@mail.gmail.com%3E
That is, a NoNode error in Zookeeper when rebalancing. The Kafka receiver
will retry again and again, but will eventually fail, leading to
unprocessed data and, worse, the task never terminating. There is nothing
exotic about my setup; one Zookeeper node, one Kafka broker, so I am
wondering if other people have seen this error before and, more important,
how to fix it. When I don't use the approach of multiple kafkaStreams, I
don't get this error, but also work is never distributed in my cluster...

Thanks
Tobias

On Thu, Jul 3, 2014 at 11:58 AM, Tobias Pfeiffer <t...@preferred.jp> wrote:

> Thank you very much for the link, that was very helpful!
>
> So, apparently the `topics: Map[String, Int]` parameter controls the
> number of partitions that the data is initially added to; the number N in
>
>   val kafkaInputs = (1 to N).map { _ =>
>     ssc.kafkaStream(zkQuorum, groupId, Map("topic" -> 1))
>   }
>   val union = ssc.union(kafkaInputs)
>
> controls how many connections are made to Kafka. Note that the number of
> Kafka partitions for that topic must be at least N for this to work.
>
> Thanks
> Tobias
>

Re: Distribute data from Kafka evenly on cluster

Reply via email to