One thing you should look at is your batch duration and 
spark.streaming.blockInterval

Those 2 things control how many partitions are generated for each RDD (batch) 
of the DStream when using a receiver (vs direct approach).

So if you have a 2 second batch duration and the default blockInterval of 200ms 
this will create 10 partitions. This means you can have a max of 10 parallel 
tasks (as long as you have the cores available) running at a time for a 
map-like operation.




On 9/25/15, 9:08 AM, "nib...@free.fr" <nib...@free.fr> wrote:

>Hello,
>I used a custom receiver in order to receive JMS messages from MQ Servers.
>I want to benefit of Yarn cluster, my questions are :
>
>- Is it possible to have only one node receiving JMS messages and parralelize 
>the RDD over all the cluster nodes ?
>- Is it possible to parallelize also the message receiver over cluster nodes ?
>
>If you have any code example for the both items it would be fine, because the 
>parralelization mechanism in the code is not crystal clear for me ...
>
>Tks
>Nicolas
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>For additional commands, e-mail: user-h...@spark.apache.org
>

Reply via email to