Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16867#discussion_r104508618
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -740,6 +743,7 @@ private[spark] class TaskSetManager(
         }
         removeRunningTask(tid)
         info.markFinished(state)
    +    successfulTaskDurations.remove(taskInfos(tid).duration)
    --- End diff --
    
    I wouldn't bother with this, because the remove method is complicated (so 
you can eliminate it without this line of code), and the median is good because 
it's not very sensitive to outlier values, so I think it's fine to leave failed 
tasks in.  Also, I'm not sure this code is correct, because it's possible that 
the duration hadn't yet been added to successfulTaskDurations, in which case 
you could be removing the duration that happens to be the same, but actually 
corresponds to a successful task.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to