If you cache rdd it will save some operations. But anyway filter is a lazy operation. And it runs based on what you will do later on with rdd1 and rdd2...
Raghavendra On Jul 16, 2015 1:33 PM, "Bin Wang" <wbi...@gmail.com> wrote: > If I write code like this: > > val rdd = input.map(_.value) > val f1 = rdd.filter(_ == 1) > val f2 = rdd.filter(_ == 2) > ... > > Then the DAG of the execution may be this: > > -> Filter -> ... > Map > -> Filter -> ... > > But the two filters is operated on the same RDD, which means it could be > done by just scan the RDD once. Does spark have this kind optimization for > now? >