Re: Does filter on an RDD scan every data item ?

dsiegel Thu, 11 Dec 2014 12:39:36 -0800

Also, you may want to use .lookup() instead of .filter()

def
lookup(key: K): Seq[V]
Return the list of values in the RDD for key key. This operation is done
efficiently if the RDD has a known partitioner by only searching the
partition that the key maps to.


You might want to partition your first batch of data with .partitionBy()
using your CustomTuple hash implementation, persist it, and do not run any
operations on it which can remove it's partitioner object.










--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p20639.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Does filter on an RDD scan every data item ?

Reply via email to