venkata91 commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r516910659



##########
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##########
@@ -657,6 +679,14 @@ class BlockManagerMasterEndpoint(
     }
   }
 
+  private def getMergerLocations(
+      numMergersNeeded: Int,
+      hostsToFilter: Set[String]): Seq[BlockManagerId] = {
+    // Copying the merger locations to a list so that the original 
mergerLocations won't be shuffled
+    val mergers = mergerLocations.values.filterNot(x => 
hostsToFilter.contains(x.host)).toSeq
+    Utils.randomize(mergers).take(numMergersNeeded)

Review comment:
       For now, I have avoided shuffling the merger locations in 
`BlockManagerMasterEndpoint` but on a second thought we are only sending the 
`mergersNeeded` which can be less than the overall `shuffleMergers` available. 
Shouldn't we shuffle in `BlockManagerMasterEndpoint` itself as it has the whole 
list and what we are returning is the only required number of hosts. We 
shuffled the hosts info also to prevent any unnecessary node hotspots. 
@tgravescs Thoughts?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to