vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid 
the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r312674680
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
 ##########
 @@ -51,6 +51,9 @@ class BlockManagerMasterEndpoint(
   // Mapping from block manager id to the block manager's information.
   private val blockManagerInfo = new mutable.HashMap[BlockManagerId, 
BlockManagerInfo]
 
+  // Mapping from executor id to the block manager's local disk directories.
+  private val executorIdToLocalDirs = new mutable.HashMap[String, 
Array[String]]
 
 Review comment:
   I don't think letting it grow unbounded is a good idea.
   
   One idea that I haven't really thought through (or checked whether it's 
viable) is to try to look into the `MapOutputTracker` and clean this up when 
there are no more known outputs needed from that executor's host.
   
   (Or maybe even keep this cache inside the `MapOutputTracker`?)
   
   Otherwise, having a limit to this cache would be good (e.g. with a LRU 
eviction policy). Worst thing is what Imran says, you'll go over the network.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to