Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/11916#discussion_r57369488 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -620,6 +620,14 @@ private[spark] class TaskSetManager( // Note: "result.value()" only deserializes the value when it's called at the first time, so // here "result.value()" just returns the value and won't block other threads. sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info) + // Kill other task attempts if any as the one attempt succeeded + for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber --- End diff -- so I'll try the patch out but I'm pretty sure it will still show multiple succeeded tasks that were speculative. in SparkHadoopMapRedUtil.commitTask it has the check: if (committer.needsTaskCommit(mrTaskContext)) { ... } else { // Some other attempt committed the output, so we do nothing and signal success logInfo(s"No need to commit output of task because needsTaskCommit=false: $mrTaskAttemptID") } So if another task commits, and then the second speculative task tries to commit, its simply going to log this message and send the task finished event back to driver. Driver is going to take that as success. If your intention is just to solve the issue with killing tasks perhaps move this PR to be for https://issues.apache.org/jira/browse/SPARK-10530, and leave SPARK-13343 open.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org