Hello, Is there a way to instruct treeReduce() to reduce RDD partitions on the same node locally?
In my case, I'm using treeReduce() to reduce map results in parallel. My reduce function is just arithmetically adding map values (i.e. no notion of aggregation by key). As far as I understand, a shuffle will happen at each treeReduce() stage using a hash partitioner with the RDD partition index as input. I would like to enforce RDD partitions on the same node to be reduced locally (i.e. no shuffling) and only shuffle when each node has one RDD partition left and before the results are sent to the driver. I have few nodes and lots of partitions so I think this will give better performance. Thank you, Ayman -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Locality-aware-tree-reduction-tp26885.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org