[GitHub] [spark] viirya commented on pull request #30392: [SPARK-33465][CORE] RDD.takeOrdered should get rid of usage of reduce or use treeReduce instead

GitBox Tue, 17 Nov 2020 18:11:30 -0800


viirya commented on pull request #30392:
URL: https://github.com/apache/spark/pull/30392#issuecomment-729330116



   > There will not be an additional map task - it will get pipelined with the 
`mapPartitions` - with the `iter.reduceLeft` in `reduce` working on a single 
element. Essentially, I am not sure what this change is buying us.
   
   Oh, yeah, you're right. I miss it. So the concern is only the `reduce` usage 
here makes me wondering it really "reduce" elements among partitions, but it 
doesn't actually. It is just doing driver-side reduce.
   
   > If the concern had been that the driver is handling all the priority 
queue's - I can see that being an issue (that is a general critique on reduce 
itself).
   
   I think this is the place `treeReduce` can improve.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #30392: [SPARK-33465][CORE] RDD.takeOrdered should get rid of usage of reduce or use treeReduce instead

Reply via email to