viirya edited a comment on pull request #30392: URL: https://github.com/apache/spark/pull/30392#issuecomment-728773432
> I am not sure I follow - the `reduce` will reduce it at driver - based on the individual priority queues per partition - while `treeReduce` will progressively reduce it in executors before pulling final pq result to driver. > What is the concern here ? `RDD.reduce` not only runs driver side reduce but also reduce per partition at executor side. For `takeOrdered`, the executor side reduce is not necessary because there is just only one element (the priority queue). I think we either remove `reduce` from `takeOrdered` and simply do a driver side reduce, or use `treeReduce` which can actually do executor side reduce for `takeOrdered` case. I read the code few times and that looks makes sense to me. I'm not sure if I miss anything. Please let me know. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org