What if I would use both rdd1 and rdd2 later? Raghavendra Pandey <raghavendra.pan...@gmail.com>于2015年7月16日周四 下午4:08写道:
> If you cache rdd it will save some operations. But anyway filter is a lazy > operation. And it runs based on what you will do later on with rdd1 and > rdd2... > > Raghavendra > On Jul 16, 2015 1:33 PM, "Bin Wang" <wbi...@gmail.com> wrote: > >> If I write code like this: >> >> val rdd = input.map(_.value) >> val f1 = rdd.filter(_ == 1) >> val f2 = rdd.filter(_ == 2) >> ... >> >> Then the DAG of the execution may be this: >> >> -> Filter -> ... >> Map >> -> Filter -> ... >> >> But the two filters is operated on the same RDD, which means it could be >> done by just scan the RDD once. Does spark have this kind optimization for >> now? >> >