[GitHub] spark pull request: [SPARK-3411] Improve load-balancing of concurr...

markhamstra Mon, 08 Sep 2014 22:45:44 -0700

Github user markhamstra commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1106#discussion_r17283247
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
    @@ -481,13 +481,27 @@ private[spark] class Master(
         if (state != RecoveryState.ALIVE) { return }
     
         // First schedule drivers, they take strict precedence over 
applications
    -    val shuffledWorkers = Random.shuffle(workers) // Randomization helps 
balance drivers
    -    for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {
    -      for (driver <- List(waitingDrivers: _*)) { // iterate over a copy of 
waitingDrivers
    -        if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= 
driver.desc.cores) {
    +    // Randomization helps balance drivers
    +    val shuffledWorkers = Random.shuffle(workers).toArray
    --- End diff --
    
    This just looks wrong or pointless to me.  Because `workers` is a HashSet, 
it's not ordered.  Logically, shuffling a Set (i.e., changing the order in 
which elements are added to the Set) just gets you the same Set back again.  I 
think that is also what actually happens with a HashSet: 
    ```
    scala> import scala.util.Random
    import scala.util.Random
    
    scala> val aHashSet = scala.collection.mutable.HashSet(...)
    aHashSet: scala.collection.mutable.HashSet[...] = Set(...)
    
    scala> val shuffledSet = Random.shuffle(aHashSet)
    shuffledSet: scala.collection.mutable.HashSet[...] = Set(...)
    
    scala> { for (i <- 0 until aHashSet.size) yield aHashSet(i) == 
shuffledSet(i) }.forall(_ == true)
    res0: Boolean = true
    ```
    To actually accomplish the reordering, you need to convert to a Seq before 
doing the shuffling:
    ```scala
    val shuffledWorkers = Random.shuffle(workers.toSeq)  // yields an 
ArrayBuffer
    ```
    
    And I'll ask again, what is the point of including workers whose state is 
not `WorkerState.ALIVE` and going through the trouble of shuffling in these 
workers that will only be filtered out later?  I really think that what you 
want is:
    ```scala
    val shuffledWorkers = Random.shuffle(workers.toSeq.filter(_.state == 
WorkerState.ALIVE))
    ```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3411] Improve load-balancing of concurr...

Reply via email to