Depending on what you do with them, they will get computed separately.
Bcoz u may have long dag in each branch. So spark tries to run all the
transformation function together rather than trying to optimize things
across branches.
On Jul 16, 2015 1:40 PM, "Bin Wang" <wbi...@gmail.com> wrote:

> What if I would use both rdd1 and rdd2 later?
>
> Raghavendra Pandey <raghavendra.pan...@gmail.com>于2015年7月16日周四 下午4:08写道:
>
>> If you cache rdd it will save some operations. But anyway filter is a
>> lazy operation. And it runs based on what you will do later on with rdd1
>> and rdd2...
>>
>> Raghavendra
>> On Jul 16, 2015 1:33 PM, "Bin Wang" <wbi...@gmail.com> wrote:
>>
>>> If I write code like this:
>>>
>>> val rdd = input.map(_.value)
>>> val f1 = rdd.filter(_ == 1)
>>> val f2 = rdd.filter(_ == 2)
>>> ...
>>>
>>> Then the DAG of the execution may be this:
>>>
>>>          -> Filter -> ...
>>> Map
>>>          -> Filter -> ...
>>>
>>> But the two filters is operated on the same RDD, which means it could be
>>> done by just scan the RDD once. Does spark have this kind optimization for
>>> now?
>>>
>>

Reply via email to