I think you need to dig into the custom receiver implementation. As long as the 
source is distributed and partitioned, the downstream .map, .foreachXX are all 
distributed as you would expect.

You could look at how the “classic” Kafka receiver is instantiated in the 
streaming guide and try to start from there:
http://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism-in-data-receiving




-adrian

On 9/22/15, 1:51 AM, "nib...@free.fr" <nib...@free.fr> wrote:

>Hello, 
>Please could you explain me what is exactly distributed when I launch a spark 
>streaming job over YARN cluster ?
>My code is something like :
>
>JavaDStream<JMSEvent> customReceiverStream = 
>ssc.receiverStream(streamConfig.getJmsReceiver());
>
>JavaDStream<String> incoming_msg = customReceiverStream.map(
>                               new Function<JMSEvent, String>()
>                               {
>                                       public String call(JMSEvent jmsEvent)
>                                       {
>                                               return jmsEvent.getText();
>                                       }
>                               }
>                               );
>
>incoming_msg.foreachRDD( new Function<JavaRDD<String>,  Void>() {
>       public Void call(JavaRDD<String> rdd) throws Exception {
>               rdd.foreachPartition(new VoidFunction<Iterator<String>>() {     
>                                         
>                       @Override
>                       public void call(Iterator<String> msg) throws Exception 
> {
>                               while (msg.hasNext()) {
>                                  // insert message in MongoDB
>                                }
>....
>
>So, in this code , at what step is done the distribution over YARN :
>- Does my receiver is distributed (and so all the rest also) ?
>- Does the foreachRDD is distributed (and so all the rest also)?
>- Does foreachPartition is distributed ?
>
>Tks
>Nicolas
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>For additional commands, e-mail: user-h...@spark.apache.org
>

Reply via email to