there is a *PartitionPruningRDD*
:: DeveloperApi :: A RDD used to prune RDD partitions/partitions so we can avoid launching tasks on all partitions. An example use case: If we know the RDD is partitioned by range, and the execution DAG has a filter on the key, we can avoid launching tasks on partitions that don't have the range covering the key. seems exactly made for the case, but it's marked as DeveloperApi, anyone know how to use it? On Mon, Dec 8, 2014 at 11:31 AM, nsareen <nsar...@gmail.com> wrote: > @Sowen, would appreciate, if you can explain how would Spark SQL help in my > scenario.. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p20571.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >