Spark Streaming distributed job

2015-09-21 Thread nibiau
Hello, 
Please could you explain me what is exactly distributed when I launch a spark 
streaming job over YARN cluster ?
My code is something like :

JavaDStream customReceiverStream = 
ssc.receiverStream(streamConfig.getJmsReceiver());

JavaDStream incoming_msg = customReceiverStream.map(
new Function()
{
public String call(JMSEvent jmsEvent)
{
return jmsEvent.getText();
}
}
);

incoming_msg.foreachRDD( new Function,  Void>() {
public Void call(JavaRDD rdd) throws Exception {
rdd.foreachPartition(new VoidFunction>() { 

@Override
public void call(Iterator msg) throws Exception 
{
while (msg.hasNext()) {
   // insert message in MongoDB
}


So, in this code , at what step is done the distribution over YARN :
- Does my receiver is distributed (and so all the rest also) ?
- Does the foreachRDD is distributed (and so all the rest also)?
- Does foreachPartition is distributed ?

Tks
Nicolas

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming distributed job

2015-09-22 Thread Adrian Tanase
I think you need to dig into the custom receiver implementation. As long as the 
source is distributed and partitioned, the downstream .map, .foreachXX are all 
distributed as you would expect.

You could look at how the “classic” Kafka receiver is instantiated in the 
streaming guide and try to start from there:
http://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism-in-data-receiving




-adrian

On 9/22/15, 1:51 AM, "nib...@free.fr"  wrote:

>Hello, 
>Please could you explain me what is exactly distributed when I launch a spark 
>streaming job over YARN cluster ?
>My code is something like :
>
>JavaDStream customReceiverStream = 
>ssc.receiverStream(streamConfig.getJmsReceiver());
>
>JavaDStream incoming_msg = customReceiverStream.map(
>   new Function()
>   {
>   public String call(JMSEvent jmsEvent)
>   {
>   return jmsEvent.getText();
>   }
>   }
>   );
>
>incoming_msg.foreachRDD( new Function,  Void>() {
>   public Void call(JavaRDD rdd) throws Exception {
>   rdd.foreachPartition(new VoidFunction>() { 
> 
>   @Override
>   public void call(Iterator msg) throws Exception 
> {
>   while (msg.hasNext()) {
>  // insert message in MongoDB
>}
>
>
>So, in this code , at what step is done the distribution over YARN :
>- Does my receiver is distributed (and so all the rest also) ?
>- Does the foreachRDD is distributed (and so all the rest also)?
>- Does foreachPartition is distributed ?
>
>Tks
>Nicolas
>
>-
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>For additional commands, e-mail: user-h...@spark.apache.org
>