KafkaUtils not consuming all the data from all partitions

2015-01-07 Thread Mukesh Jha
Hi Guys, I have a kafka topic having 90 partitions and I running SparkStreaming(1.2.0) to read from kafka via KafkaUtils to create 10 kafka-receivers. My streaming is running fine and there is no delay in processing, just that some partitions data is never getting picked up. From the kafka

Re: KafkaUtils not consuming all the data from all partitions

2015-01-07 Thread Gerard Maas
Hi, Could you add the code where you create the Kafka consumer? -kr, Gerard. On Wed, Jan 7, 2015 at 3:43 PM, francois.garil...@typesafe.com wrote: Hi Mukesh, If my understanding is correct, each Stream only has a single Receiver. So, if you have each receiver consuming 9 partitions, you

Re: KafkaUtils not consuming all the data from all partitions

2015-01-07 Thread francois . garillot
Hi Mukesh, If my understanding is correct, each Stream only has a single Receiver. So, if you have each receiver consuming 9 partitions, you need 10 input DStreams to create 10 concurrent receivers:

Re: KafkaUtils not consuming all the data from all partitions

2015-01-07 Thread Mukesh Jha
I understand that I've to create 10 parallel streams. My code is running fine when the no of partitions is ~20, but when I increase the no of partitions I keep getting in this issue. Below is my code to create kafka streams, along with the configs used. MapString, String kafkaConf = new

Re: KafkaUtils not consuming all the data from all partitions

2015-01-07 Thread francois . garillot
- You are launching up to 10 threads/topic per Receiver. Are you sure your receivers can support 10 threads each ? (i.e. in the default configuration, do they have 10 cores). If they have 2 cores, that would explain why this works with 20 partitions or less. - If you have 90 partitions, why

Re: KafkaUtils not consuming all the data from all partitions

2015-01-07 Thread Gerard Maas
AFAIK, there're two levels of parallelism related to the Spark Kafka consumer: At JVM level: For each receiver, one can specify the number of threads for a given topic, provided as a map [topic - nthreads]. This will effectively start n JVM threads consuming partitions of that kafka topic. At