[GitHub] [spark] viirya edited a comment on pull request #30392: [SPARK-33465][CORE] RDD.takeOrdered should get rid of usage of reduce or use treeReduce instead

GitBox Tue, 17 Nov 2020 00:40:47 -0800


viirya edited a comment on pull request #30392:
URL: https://github.com/apache/spark/pull/30392#issuecomment-728773432



   > I am not sure I follow - the `reduce` will reduce it at driver - based on 
the individual priority queues per partition - while `treeReduce` will 
progressively reduce it in executors before pulling final pq result to driver.
   > What is the concern here ?
   
   `RDD.reduce` not only runs driver side reduce but also reduce per partition 
at executor side. For `takeOrdered`, the executor side reduce is not necessary 
because there is just only one element (the priority queue).
   
   I think we either remove `reduce` from `takeOrdered` and simply do a driver 
side reduce, or use `treeReduce` which can actually do executor side reduce for 
`takeOrdered` case.
   
   I read the code few times and that looks makes sense to me. I'm not sure if 
I miss anything. Please let me know.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya edited a comment on pull request #30392: [SPARK-33465][CORE] RDD.takeOrdered should get rid of usage of reduce or use treeReduce instead

Reply via email to