squito commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks… URL: https://github.com/apache/spark/pull/23677#discussion_r304136584
########## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ########## @@ -1054,6 +1043,19 @@ private[spark] object TaskSetManager { val TASK_SIZE_TO_WARN_KIB = 1000 } +// Set of pending tasks for various levels of locality: executor, host, rack, +// noPrefs and anyPrefs. These collections are actually +// treated as stacks, in which new tasks are added to the end of the +// ArrayBuffer and removed from the end. This makes it faster to detect +// tasks that repeatedly fail because whenever a task failed, it is put +// back at the head of the stack. These collections may contain duplicates +// for two reasons: +// (1): Tasks are only removed lazily; when a task is launched, it remains +// in all the pending lists except the one that it was launched from. +// (2): Tasks may be re-added to these lists multiple times as a result +// of failures. +// Duplicates are handled in dequeueTaskFromList, which ensures that a +// task hasn't already started running before launching it. Review comment: turn this into a scaladoc comment so its shows up in IDEs for PendingTasksByLocality ``` /** * Set of ... ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org