Hi All,
In Spark, each action results in launching a job. Lets say my spark app
looks as-
val baseRDD =sc.parallelize(Array(1,2,3,4,5),2)
val rdd1 = baseRdd.map(x => x+2)
val rdd2 = rdd1.filter(x => x%2 ==0)
val count = rdd2.count
val firstElement = rdd2.first
println("Count is"+count)
Hi
"... Here in job2, when calculating rdd.first..."
If you mean if rdd2.first, then it uses rdd2 already computed by
rdd2.count, because it is already available. If some partitions are not
available due to GC, then only those partitions are recomputed.
On Sun, Sep 6, 2015 at 5:11 PM, Jeff
Hi All,
Thanks for the info. I have one more doubt -
When writing a streaming application, I specify batch-interval. Lets say if
the interval is 1sec, for every 1sec batch, rdd is formed and launches a
job. If there are >1 action specified on an rddhow many jobs would it
launch???
I mean