Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1106#discussion_r17283247 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -481,13 +481,27 @@ private[spark] class Master( if (state != RecoveryState.ALIVE) { return } // First schedule drivers, they take strict precedence over applications - val shuffledWorkers = Random.shuffle(workers) // Randomization helps balance drivers - for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) { - for (driver <- List(waitingDrivers: _*)) { // iterate over a copy of waitingDrivers - if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) { + // Randomization helps balance drivers + val shuffledWorkers = Random.shuffle(workers).toArray --- End diff -- This just looks wrong or pointless to me. Because `workers` is a HashSet, it's not ordered. Logically, shuffling a Set (i.e., changing the order in which elements are added to the Set) just gets you the same Set back again. I think that is also what actually happens with a HashSet: ``` scala> import scala.util.Random import scala.util.Random scala> val aHashSet = scala.collection.mutable.HashSet(...) aHashSet: scala.collection.mutable.HashSet[...] = Set(...) scala> val shuffledSet = Random.shuffle(aHashSet) shuffledSet: scala.collection.mutable.HashSet[...] = Set(...) scala> { for (i <- 0 until aHashSet.size) yield aHashSet(i) == shuffledSet(i) }.forall(_ == true) res0: Boolean = true ``` To actually accomplish the reordering, you need to convert to a Seq before doing the shuffling: ```scala val shuffledWorkers = Random.shuffle(workers.toSeq) // yields an ArrayBuffer ``` And I'll ask again, what is the point of including workers whose state is not `WorkerState.ALIVE` and going through the trouble of shuffling in these workers that will only be filtered out later? I really think that what you want is: ```scala val shuffledWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE)) ```
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org