Re: Need more tasks in KafkaDirectStream

Adrian Tanase Thu, 29 Oct 2015 04:13:00 -0700

You can call .repartition on the Dstream created by the Kafka direct consumer. 
You take the one-time hit of a shuffle but gain the ability to scale out 
processing beyond your number of partitions.

We’re doing this to scale up from 36 partitions / topic to 140 partitions (20 
cores * 7 nodes) and it works great.

-adrian

From: varun sharma
Date: Thursday, October 29, 2015 at 8:27 AM
To: user
Subject: Need more tasks in KafkaDirectStream

Right now, there is one to one correspondence between kafka partitions and 
spark partitions.
I dont have a requirement of one to one semantics.
I need more tasks to be generated in the job so that it can be parallelised and 
batch can be completed fast. In the previous Receiver based approach number of 
tasks created were independent of kafka partitions, I need something like that 
only.
Any config available if I dont need one to one semantics?
Is there any way I can repartition without incurring any additional cost.

Thanks
VARUN SHARMA

Re: Need more tasks in KafkaDirectStream

Reply via email to