[GitHub] [spark] mridulm commented on pull request #30392: [SPARK-33465][CORE] RDD.takeOrdered should get rid of usage of reduce or use treeReduce instead

2020-11-17 Thread GitBox


mridulm commented on pull request #30392:
URL: https://github.com/apache/spark/pull/30392#issuecomment-729318977


   > I think so. In some cases, unnecessary executor-side reduce might invoke 
an additional map task although it just returns the single element. So this is 
just a minor concern for me.
   
   There will not be an additional reduce side task - it will get pipelined 
with the `mapPartitions` - with the `iter.reduceLeft` in `reduce` working on a 
single element. Essentially, I am not sure what this change is buying us.
   
   If the concern had been that the driver is handling all the priority queue's 
- I can see that being an issue (that is a general critique on reduce itself).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #30392: [SPARK-33465][CORE] RDD.takeOrdered should get rid of usage of reduce or use treeReduce instead

2020-11-16 Thread GitBox


mridulm commented on pull request #30392:
URL: https://github.com/apache/spark/pull/30392#issuecomment-728746062


   I am not sure I follow - the `reduce` will reduce it at driver - based on 
the individual priority queues per partition - while `treeReduce` will 
progressively reduce it in executors before pulling final pq result to driver.
   What is the concern here ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org