If I write code like this: val rdd = input.map(_.value) val f1 = rdd.filter(_ == 1) val f2 = rdd.filter(_ == 2) ...
Then the DAG of the execution may be this: -> Filter -> ... Map -> Filter -> ... But the two filters is operated on the same RDD, which means it could be done by just scan the RDD once. Does spark have this kind optimization for now?