Spark - launchng job for each action

2015-09-06 Thread Priya Ch
Hi All, In Spark, each action results in launching a job. Lets say my spark app looks as- val baseRDD =sc.parallelize(Array(1,2,3,4,5),2) val rdd1 = baseRdd.map(x => x+2) val rdd2 = rdd1.filter(x => x%2 ==0) val count = rdd2.count val firstElement = rdd2.first println("Count is"+count)

Re: Spark - launchng job for each action

2015-09-06 Thread ayan guha
Hi "... Here in job2, when calculating rdd.first..." If you mean if rdd2.first, then it uses rdd2 already computed by rdd2.count, because it is already available. If some partitions are not available due to GC, then only those partitions are recomputed. On Sun, Sep 6, 2015 at 5:11 PM, Jeff

Re: Spark - launchng job for each action

2015-09-06 Thread Priya Ch
Hi All, Thanks for the info. I have one more doubt - When writing a streaming application, I specify batch-interval. Lets say if the interval is 1sec, for every 1sec batch, rdd is formed and launches a job. If there are >1 action specified on an rddhow many jobs would it launch??? I mean