Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107534830 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler]( taskState: TaskState, reason: TaskFailedReason): Unit = synchronized { taskSetManager.handleFailedTask(tid, taskState, reason) - if (!taskSetManager.isZombie && taskState != TaskState.KILLED) { + if (!taskSetManager.isZombie) { --- End diff -- I don't think you can get a request storm. The _only_ case this guards against is exactly what you mentioned -- a speculative task that is killed because another attempt succeeded. The number of speculative tasks is always small so this shouldn't be an issue. In comparison, we hit this code path much more often with failed tasks. If we were to add this check back, then in the task kill API we would have to add a parameter as to whether revive offers should be called. This is substantial added complexity which can be removed if we make this simplification at this site. I'll leave it to @kayousterhout to decide if this is worth it.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org