Re: Will multiple filters on the same RDD optimized to one filter?

Bin Wang Thu, 16 Jul 2015 01:12:08 -0700

What if I would use both rdd1 and rdd2 later?

Raghavendra Pandey <raghavendra.pan...@gmail.com>于2015年7月16日周四 下午4:08写道：


> If you cache rdd it will save some operations. But anyway filter is a lazy
> operation. And it runs based on what you will do later on with rdd1 and
> rdd2...
>
> Raghavendra
> On Jul 16, 2015 1:33 PM, "Bin Wang" <wbi...@gmail.com> wrote:
>
>> If I write code like this:
>>
>> val rdd = input.map(_.value)
>> val f1 = rdd.filter(_ == 1)
>> val f2 = rdd.filter(_ == 2)
>> ...
>>
>> Then the DAG of the execution may be this:
>>
>>          -> Filter -> ...
>> Map
>>          -> Filter -> ...
>>
>> But the two filters is operated on the same RDD, which means it could be
>> done by just scan the RDD once. Does spark have this kind optimization for
>> now?
>>
>

Re: Will multiple filters on the same RDD optimized to one filter?

Reply via email to