mridulm commented on pull request #28911: URL: https://github.com/apache/spark/pull/28911#issuecomment-656595748
The fix for this need not necessarily come in this PR, but can be a feature addition. Note that local shuffle reads from across executors on a node will really benefit when locality preference also accounts for it - until then, the potential benefits will be reduced. The solution is fairly straightforward, given existing implementation of `getLocationsWithLargestOutputs` - when aggregating, aggregate by host instead of blockmanager id when local reads across executors on a node are possible. This PR, #25299 are candidates when this can be enabled (with suitable flag checks, etc). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org