viirya commented on pull request #30392: URL: https://github.com/apache/spark/pull/30392#issuecomment-729330116
> There will not be an additional map task - it will get pipelined with the `mapPartitions` - with the `iter.reduceLeft` in `reduce` working on a single element. Essentially, I am not sure what this change is buying us. Oh, yeah, you're right. I miss it. So the concern is only the `reduce` usage here makes me wondering it really "reduce" elements among partitions, but it doesn't actually. It is just doing driver-side reduce. > If the concern had been that the driver is handling all the priority queue's - I can see that being an issue (that is a general critique on reduce itself). I think this is the place `treeReduce` can improve. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org