Hi

"... Here in job2, when calculating rdd.first..."

If you mean if rdd2.first, then it uses rdd2 already computed by
rdd2.count, because it is already available. If some partitions are not
available due to GC, then only those partitions are recomputed.

On Sun, Sep 6, 2015 at 5:11 PM, Jeff Zhang <zjf...@gmail.com> wrote:

> If you want to reuse the data, you need to call rdd2.cache
>
>
>
> On Sun, Sep 6, 2015 at 2:33 PM, Priya Ch <learnings.chitt...@gmail.com>
> wrote:
>
>> Hi All,
>>
>>  In Spark, each action results in launching a job. Lets say my spark app
>> looks as-
>>
>> val baseRDD =sc.parallelize(Array(1,2,3,4,5),2)
>> val rdd1 = baseRdd.map(x => x+2)
>> val rdd2 = rdd1.filter(x => x%2 ==0)
>> val count = rdd2.count
>> val firstElement = rdd2.first
>>
>> println("Count is"+count)
>> println("First is"+firstElement)
>>
>> Now, rdd2.count launches  job0 with 1 task and rdd2.first launches job1
>> with 1 task. Here in job2, when calculating rdd.first, is the entire
>> lineage computed again or else as job0 already computes rdd2, is it reused
>> ???
>>
>> Thanks,
>> Padma Ch
>>
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards,
Ayan Guha

Reply via email to