Hello,

Is there a way to instruct treeReduce() to reduce RDD partitions on the
same node locally?

In my case, I'm using treeReduce() to reduce map results in parallel. My
reduce function is just arithmetically adding map results (i.e. no notion
of aggregation by key). As far as I understand, a shuffle will happen at
each treeReduce() stage using a hash partitioner with the RDD partition
index as input. I would like to enforce RDD partitions on the same node to
be reduced locally and only shuffle when each node has one RDD partition
left and before the results are sent to the driver.

I have few nodes and lots of partitions so I think this will give better
performance.

Thank you,
Ayman

Reply via email to