Hi,

I am working with Spark in Java on top of a HDFS cluster. In my code two
RDDs are partitioned with the same partitioner (HashPartitioner with the
same number of partitions), so they are co-partitioned.
Thus same keys are on the same partitions' number but that does not mean
that both RDDs are necessarily co-located, that's to say that same
partitions are on same nodes.
For example partition#1 from RDD#1 may not be on the same node as
partition#1 from RDD#2. I would like to co-locate partitioned RDDs to reduce
data transfer between nodes when applying a join operation on the RDDs. 
Is there a way to do that?

Thank you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-co-locate-partitions-from-two-partitioned-RDDs-tp26008.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to