Re: Need more tasks in KafkaDirectStream

2015-10-29 Thread Cody Koeninger
* 7 nodes) and it works great. > > -adrian > > From: varun sharma > Date: Thursday, October 29, 2015 at 8:27 AM > To: user > Subject: Need more tasks in KafkaDirectStream > > Right now, there is one to one correspondence between kafka partitions and > spark partition

Re: Need more tasks in KafkaDirectStream

2015-10-29 Thread varun sharma
in the ability to >> scale out processing beyond your number of partitions. >> >> We’re doing this to scale up from 36 partitions / topic to 140 partitions >> (20 cores * 7 nodes) and it works great. >> >> -adrian >> >> From: varun sharma >> Date:

Need more tasks in KafkaDirectStream

2015-10-29 Thread varun sharma
Right now, there is one to one correspondence between kafka partitions and spark partitions. I dont have a requirement of one to one semantics. I need more tasks to be generated in the job so that it can be parallelised and batch can be completed fast. In the previous Receiver based approach

Re: Need more tasks in KafkaDirectStream

2015-10-29 Thread Dibyendu Bhattacharya
If you do not need one to one semantics and does not want strict ordering guarantee , you can very well use the Receiver based approach, and this consumer from Spark-Packages ( https://github.com/dibbhatt/kafka-spark-consumer) can give much better alternatives in terms of performance and

Re: Need more tasks in KafkaDirectStream

2015-10-29 Thread Adrian Tanase
) and it works great. -adrian From: varun sharma Date: Thursday, October 29, 2015 at 8:27 AM To: user Subject: Need more tasks in KafkaDirectStream Right now, there is one to one correspondence between kafka partitions and spark partitions. I dont have a requirement of one to one semantics. I need