[GitHub] [spark] mridulm commented on pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

GitBox Fri, 10 Jul 2020 03:05:12 -0700


mridulm commented on pull request #28911:
URL: https://github.com/apache/spark/pull/28911#issuecomment-656595748



   The fix for this need not necessarily come in this PR, but can be a feature 
addition.
   Note that local shuffle reads from across executors on a node will really 
benefit when locality preference also accounts for it - until then, the 
potential benefits will be reduced.
   
   The solution is fairly straightforward, given existing implementation of 
`getLocationsWithLargestOutputs` - when aggregating, aggregate by host instead 
of blockmanager id when local reads across executors on a node are possible. 
This PR, #25299 are candidates when this can be enabled (with suitable flag 
checks, etc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

Reply via email to