[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222589679 Thanks @kayousterhout. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11996 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222556829 I merged this into master (so it will be in 2.1); thanks for your work on this @devaraj-kavali! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222104830 @kayousterhout, I have added inline comments and the build is also fine now, please have a look into it. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222103762 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59477/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222103760 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222103539 **[Test build #59477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59477/consoleFull)** for PR 11996 at commit [`8767e4c`](https://github.com/apache/spark/commit/8767e4cd7a764a4aec080fdbf7669cb1f8bfd195). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222082443 **[Test build #59477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59477/consoleFull)** for PR 11996 at commit [`8767e4c`](https://github.com/apache/spark/commit/8767e4cd7a764a4aec080fdbf7669cb1f8bfd195). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222010203 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222010206 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59412/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-222009995 **[Test build #59412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59412/consoleFull)** for PR 11996 at commit [`b05908c`](https://github.com/apache/spark/commit/b05908c2dfcee00cb732a72bb6ab1000a5cf5cd0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r64814037 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -795,6 +795,7 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg sc = new SparkContext("local", "test") val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) val taskSet = FakeTask.createTaskSet(4) +sc.conf.set("spark.speculation.multiplier", "0.0") --- End diff -- Can you add a comment here ("Set the speculation multiplier to be 0 so speculative tasks are launched immediately")? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r64813822 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -64,6 +64,8 @@ private[spark] class TaskSchedulerImpl( // How often to check for speculative tasks val SPECULATION_INTERVAL_MS = conf.getTimeAsMs("spark.speculation.interval", "100ms") + val MIN_TIME_TO_SPECULATION = 100 --- End diff -- Can you add a comment here? Something like "Duplicate copies of a task will only be launched if the original copy has been running for at least this amount of time. This is to avoid the overhead of launching speculative copies of tasks that are very short." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-221982283 **[Test build #59412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59412/consoleFull)** for PR 11996 at commit [`b05908c`](https://github.com/apache/spark/commit/b05908c2dfcee00cb732a72bb6ab1000a5cf5cd0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-221981351 Looks like there are build errors, where you need to update makeProgressBar in streaming/src/main/scala/org/apache/spark/streaming/ui/AllBatchesTable.scala:95 and streaming/src/main/scala/org/apache/spark/streaming/ui/BatchPage.scala:144 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-221970385 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-221970378 **[Test build #59404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59404/consoleFull)** for PR 11996 at commit [`db93d17`](https://github.com/apache/spark/commit/db93d1766f39d4f2d9fe9be0d1ce9edd68110f76). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-221970389 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59404/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-221968760 **[Test build #59404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59404/consoleFull)** for PR 11996 at commit [`db93d17`](https://github.com/apache/spark/commit/db93d1766f39d4f2d9fe9be0d1ce9edd68110f76). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-221968586 @kayousterhout Thanks a lot for your review and comments. I have fixed them, please have a look into this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r64658256 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,52 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulatorV2[_, _]]] = taskSet.tasks.map { task => + task.metrics.internalAccums +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) +assert(manager.checkSpeculatableTasks) +// Offer resource to start the speculative attempt for the running task +val taskOption5 = manager.resourceOffer("exec1", "host1", NO_PREF) +assert(taskOption5.isDefined) +val task5 = taskOption5.get +assert(task5.taskId === 4) +assert(task5.executorId === "exec1") +assert(task5.attemptNumber === 1) +sched.backend = mock(classOf[SchedulerBackend]) +// Complete the speculative attempt for the running task +manager.handleSuccessfulTask(4, createTaskResult(3, accumUpdatesByTask(3))) +// It ends the task with success status as part of manager.handleSuccessfulTask() and --- End diff -- I see what's going on here now -- can you change this comment to something like "Because the SchedulerBackend was a mock, the 2nd copy of the task won't actually be killed, so the FakeTaskScheduler is only told about the successful completion of the speculated task." I think it would be also helpful to move the verify(...) call to be before this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-221716392 A few last comments to aid readability of this code. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r64656455 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,52 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulatorV2[_, _]]] = taskSet.tasks.map { task => + task.metrics.internalAccums +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) +assert(manager.checkSpeculatableTasks) +// Offer resource to start the speculative attempt for the running task +val taskOption5 = manager.resourceOffer("exec1", "host1", NO_PREF) +assert(taskOption5.isDefined) +val task5 = taskOption5.get +assert(task5.taskId === 4) +assert(task5.executorId === "exec1") +assert(task5.attemptNumber === 1) --- End diff -- Can you also check that task5.index === 3 (maybe put this first)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r64655722 --- Diff: core/src/test/scala/org/apache/spark/ui/jobs/JobProgressListenerSuite.scala --- @@ -254,6 +253,11 @@ class JobProgressListenerSuite extends SparkFunSuite with LocalSparkContext with assert(listener.stageIdToData((task.stageId, 0)).numFailedTasks === failCount) } +// Check the killed tasks count. --- End diff -- change to "Make sure killed tasks are accounted for correctly"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r64655524 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) --- End diff -- This does seem a little trickier than I'd anticipated. I think the best thing to do is (1) change the Schedulable class to take a minTimeToSpeculation required argument. This looks pretty simple to do -- you just need to change two implementations, and this is a private spark class, so we're not changing a public or developer API. (2) add a constant MIN_TIME_TO_SPECULATION in the TaskSchedulerImpl object, and pass that value in when TaskSchedulerImpl calls checkSpeculatableTasks. I think overtime, we should have more tests to verify the speculation behavior, so this relatively small change to make this code path more testable seems worthwhile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r64654392 --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala --- @@ -338,6 +338,16 @@ private[spark] object UIUtils extends Logging { failed: Int, skipped: Int, total: Int): Seq[Node] = { +makeProgressBar(started, completed, failed, skipped, killed = 0, total) + } + + def makeProgressBar( --- End diff -- Can you change the main makeProgressBar method to always accept a killed argument (and then update the 2-3 other locations where this is called to pass in killed = 0)? It looks like the other places that use this method (e.g., the streaming BatchPage UI class) have access to the correct number of killed tasks, so should be setting it accordingly (and it's a little cumbersome to have to maintain both versions of this method). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r64653035 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala --- @@ -20,6 +20,8 @@ package org.apache.spark.scheduler import scala.collection.mutable.ListBuffer import org.apache.spark.annotation.DeveloperApi +import org.apache.spark.TaskState --- End diff -- alphabetization (capital letters should come before lowercase -- so this and the import below it should both come above org.apache.spark.annotation.DeveloperApi) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r64653419 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala --- @@ -58,24 +60,26 @@ class TaskInfo( var failed = false + var killed = false + private[spark] def markGettingResult(time: Long = System.currentTimeMillis) { gettingResultTime = time } - private[spark] def markSuccessful(time: Long = System.currentTimeMillis) { + private[spark] def markFinished(time: Long = System.currentTimeMillis, state: TaskState) { --- End diff -- Can you move state to be first? We usually put default arguments last, and then you don't need to always name the "state" argument when calling this method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-221531103 @kayousterhout, can you have look into this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-220114675 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58791/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-220114670 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-220114358 **[Test build #58791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58791/consoleFull)** for PR 11996 at commit [`4aa7e83`](https://github.com/apache/spark/commit/4aa7e83bd375b04e550fcb4cb18a8bcfc8e78e17). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63738214 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) --- End diff -- I feel adding an argument to **checkSpeculatableTasks()** would lead to change the signature of the method in the Schedulable interface and correspondingly all of its implementations. I am thinking to move the code in **TaskSetManager.checkSpeculatableTasks()** to another method which takes an argument(i.e minTimeToSpeculation: Int) and same method can be used in the test. Please give your opinion on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63736986 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) +val speculation = manager.checkSpeculatableTasks +assert(speculation === true) +// Offer resource to start the speculative attempt for the running task +val taskOption5 = manager.resourceOffer("exec1", "host1", NO_PREF) +assert(taskOption5.isDefined) +val task5 = taskOption5.get +assert(task5.taskId === 4) +assert(task5.executorId === "exec1") +assert(task5.attemptNumber === 1) +sched.backend = mock(classOf[SchedulerBackend]) +// Complete the speculative attempt for the running task +manager.handleSuccessfulTask(4, createTaskResult(3, accumUpdatesByTask(3))) +assert(sched.endedTasks(3) === Success) --- End diff -- Here **sched.backend** is **mock(classOf[SchedulerBackend])** and as part of **manager.handleSuccessfulTask()**, it issues **sched.backend.killTask()** for any other attempts. Since it is a mock invocation it only ensures that other attempts kill invocation is happening. I have added the same in the comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-220082266 **[Test build #58791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58791/consoleFull)** for PR 11996 at commit [`4aa7e83`](https://github.com/apache/spark/commit/4aa7e83bd375b04e550fcb4cb18a8bcfc8e78e17). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-220082195 Thanks a lot @kayousterhout for the review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-219752963 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58693/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-219752958 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-219752590 **[Test build #58693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58693/consoleFull)** for PR 11996 at commit [`5f2eee8`](https://github.com/apache/spark/commit/5f2eee86228aea6a397b41f5df5644f13cf788c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-219716906 **[Test build #58693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58693/consoleFull)** for PR 11996 at commit [`5f2eee8`](https://github.com/apache/spark/commit/5f2eee86228aea6a397b41f5df5644f13cf788c1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-219698138 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58689/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-219698135 **[Test build #58689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58689/consoleFull)** for PR 11996 at commit [`08f636b`](https://github.com/apache/spark/commit/08f636b2a8ba74f9415aa200bb319899a84f6cf8). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-219698137 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-219697407 **[Test build #58689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58689/consoleFull)** for PR 11996 at commit [`08f636b`](https://github.com/apache/spark/commit/08f636b2a8ba74f9415aa200bb319899a84f6cf8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63449411 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) +val speculation = manager.checkSpeculatableTasks +assert(speculation === true) +// Offer resource to start the speculative attempt for the running task +val taskOption5 = manager.resourceOffer("exec1", "host1", NO_PREF) +assert(taskOption5.isDefined) +val task5 = taskOption5.get +assert(task5.taskId === 4) +assert(task5.executorId === "exec1") +assert(task5.attemptNumber === 1) +sched.backend = mock(classOf[SchedulerBackend]) +// Complete the speculative attempt for the running task +manager.handleSuccessfulTask(4, createTaskResult(3, accumUpdatesByTask(3))) +assert(sched.endedTasks(3) === Success) --- End diff -- Why isn't this Killed? My understanding is that the task set finishing should trigger this call: https://github.com/devaraj-kavali/spark/blob/ba9ffab65f9f003af3a27671b8610525c2e38d84/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L733 which will cause the FakeDagScheduler to set sched.endedTasks(3) to Killed. Or does that happen *before* the successful attempt happens, so the successful one overrides it? (in any case, can you add a comment describing this?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63448950 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) +val speculation = manager.checkSpeculatableTasks +assert(speculation === true) --- End diff -- here just do "assert(manager.checkSpeculatableTasks)"? (since the === isn't helpful for boolean values, where it's obvious what the expected / actual were) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63448866 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) --- End diff -- What about adding an argument to checkSpeculatableTasks to control the magic-100 value, so something like checkSpeculatableTasks(minTimeToSpeculation: Int = 100) and then we can set it to 0 for tests? In general it's nice to avoid adding sleeps to the (already slow) tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63448080 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -620,6 +620,14 @@ private[spark] class TaskSetManager( // Note: "result.value()" only deserializes the value when it's called at the first time, so // here "result.value()" just returns the value and won't block other threads. sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info) +// Kill other task attempts if any as the one attempt succeeded +for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber --- End diff -- I think you don't need the middle condition here (if attemptInfo.attemptNumber != info.attemptNumber) since this attempt will no longer be running (since markSuccessful() was called above), so the last condition will fail? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63447725 --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala --- @@ -337,6 +337,16 @@ private[spark] object UIUtils extends Logging { failed: Int, skipped: Int, total: Int): Seq[Node] = { +makeProgressBar(started, completed, failed, skipped, 0, total) --- End diff -- can you use a named param here ("killed = 0")? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63447507 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala --- @@ -71,11 +73,16 @@ class TaskInfo( failed = true } + private[spark] def markKilled(time: Long = System.currentTimeMillis) { --- End diff -- Can you consolidate this and the two methods above into a single markFinished method that accepts a TaskState and a time? And then that method can handle matching the TaskState to the appropriate value of killed / failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63446802 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -620,6 +620,14 @@ private[spark] class TaskSetManager( // Note: "result.value()" only deserializes the value when it's called at the first time, so // here "result.value()" just returns the value and won't block other threads. sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info) +// Kill other task attempts if any as the one attempt succeeded --- End diff -- Can you change this to "Kill any other attempts for the same task (since those are unnecessary now that one attempt completed successfully)." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r63446363 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -620,6 +620,14 @@ private[spark] class TaskSetManager( // Note: "result.value()" only deserializes the value when it's called at the first time, so // here "result.value()" just returns the value and won't block other threads. sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info) +// Kill other task attempts if any as the one attempt succeeded +for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber +&& attemptInfo.running) { + logInfo("Killing attempt " + attemptInfo.attemptNumber + " for task " + attemptInfo.id + --- End diff -- Can you use string interpolation to make this more concise / readable? (e.g., s"Killing attempt ${attemptInfo.attemptNumber} for task ${attemptInfo.id} in stage " ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-218671842 @kayousterhout, @markhamstra any comments plz? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-211963096 ping @kayousterhout and @markhamstra --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-206019800 For scheduler changes let's ping @kayousterhout and @markhamstra --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-205967941 I've been looking into this and running some tests but at this point I haven't actually had the kill triggered. I always end up with things getting Marked failed as they finish soon after the original. Trying some more and looking into a few other things. Also cc @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r58604142 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -620,6 +620,14 @@ private[spark] class TaskSetManager( // Note: "result.value()" only deserializes the value when it's called at the first time, so // here "result.value()" just returns the value and won't block other threads. sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info) +// Kill other task attempts if any as the one attempt succeeded +for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber +&& attemptInfo.running) { + logInfo("Killing attempt " + attemptInfo.attemptNumber + " for task " + attemptInfo.id + +" in stage " + taskSet.id + " (TID " + attemptInfo.taskId + ") on " + attemptInfo.host + +" as the attempt " + info.attemptNumber + " succeeded on " + info.host) + sched.backend.killTask(attemptInfo.taskId, attemptInfo.executorId, true) --- End diff -- I think it would be better to have a killTask call in the taskScheduler (similar to cancelTask) rather then reaching in and getting the backend directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r58569079 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) --- End diff -- ah you are right, sorry looked at the wrong config. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r58563479 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) --- End diff -- Thanks @tgravescs for your quick response. Here Thread.sleep(100) is to match the threshold value mentioned in TaskSetManager.checkSpeculatableTasks(). It is the minimum time where the task needs to run for this much of time before becoming eligible for launching a speculative attempt. I don't see any way to change this default value. > val medianDuration = durations(min((0.5 * tasksSuccessful).round.toInt, durations.length - 1)) > val threshold = max(SPECULATION_MULTIPLIER * medianDuration, 100) > I don't think this threshold value is related to the config âspark.speculation.intervalâ here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r58542903 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3")) } + test("Kill other task attempts when one attempt belonging to the same task succeeds") { +sc = new SparkContext("local", "test") +val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2")) +val taskSet = FakeTask.createTaskSet(4) +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES) +val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task => + task.initialAccumulators.map { a => a.toInfo(Some(0L), None) } +} +// Offer resources for 4 tasks to start +for ((k, v) <- List( +"exec1" -> "host1", +"exec1" -> "host1", +"exec2" -> "host2", +"exec2" -> "host2")) { + val taskOption = manager.resourceOffer(k, v, NO_PREF) + assert(taskOption.isDefined) + val task = taskOption.get + assert(task.executorId === k) +} +assert(sched.startedTasks.toSet === Set(0, 1, 2, 3)) +// Complete the 3 tasks and leave 1 task in running +for (id <- Set(0, 1, 2)) { + manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id))) + assert(sched.endedTasks(id) === Success) +} + +// Wait for the threshold time to start speculative attempt for the running task +Thread.sleep(100) --- End diff -- it would be better to set spark.speculation.interval to known small value incase the default changes. we might as well make it smaller too so the test takes less time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-205789547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54979/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-205789542 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-205788878 **[Test build #54979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54979/consoleFull)** for PR 11996 at commit [`ba9ffab`](https://github.com/apache/spark/commit/ba9ffab65f9f003af3a27671b8610525c2e38d84). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-20575 **[Test build #54979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54979/consoleFull)** for PR 11996 at commit [`ba9ffab`](https://github.com/apache/spark/commit/ba9ffab65f9f003af3a27671b8610525c2e38d84). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-204313542 Thanks @tgravescs for checking this, I will add test for these changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/11996#discussion_r58110175 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -620,6 +620,14 @@ private[spark] class TaskSetManager( // Note: "result.value()" only deserializes the value when it's called at the first time, so // here "result.value()" just returns the value and won't block other threads. sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info) +// Kill other task attempts if any as the one attempt succeeded +for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber --- End diff -- it would be nice if we could add unit test for this in TaskSetManagerSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-203061970 sorry haven't had time to look at this yet, hopefully in a day or so --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-202470953 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54314/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-202470945 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-202470303 **[Test build #54314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54314/consoleFull)** for PR 11996 at commit [`1a9e365`](https://github.com/apache/spark/commit/1a9e36516e9016f43a605abce0ee49e1262363a6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-202407567 **[Test build #54314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54314/consoleFull)** for PR 11996 at commit [`1a9e365`](https://github.com/apache/spark/commit/1a9e36516e9016f43a605abce0ee49e1262363a6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-202407101 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11996#issuecomment-202318667 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/11996 [SPARK-10530] [CORE] Kill other task attempts when one taskattempt belonging the same task is succeeded in speculation ## What changes were proposed in this pull request? With this patch, TaskSetManager kills other running attempts when any one of the attempt succeeds for the same task. Also killed tasks will not be considered as failed tasks and they get listed separately in the UI and also shows the task state as KILLED instead of FAILED. ## How was this patch tested? core\src\test\scala\org\apache\spark\ui\jobs\JobProgressListenerSuite.scala core\src\test\scala\org\apache\spark\util\JsonProtocolSuite.scala I have verified this patch manually by enabling spark.speculation as true, when any attempt gets succeeded then other running attempts are getting killed for the same task and other pending tasks are getting assigned in those. And also when any attempt gets killed then they are considered as KILLED tasks and not considered as FAILED tasks. Please find the attached screen shots for the reference. ![stage-tasks-table](https://cloud.githubusercontent.com/assets/3174804/14075132/394c6a12-f4f4-11e5-8638-20ff7b8cc9bc.png) ![stages-table](https://cloud.githubusercontent.com/assets/3174804/14075134/3b60f412-f4f4-11e5-9ea6-dd0dcc86eb03.png) Ref : https://github.com/apache/spark/pull/11916 You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-10530 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11996.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11996 commit 1a9e36516e9016f43a605abce0ee49e1262363a6 Author: Devaraj K Date: 2016-03-28T09:03:07Z [SPARK-10530] [CORE] Kill other task attempts when one taskattempt belonging the same task is succeeded in speculation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/8683#issuecomment-188937905 @zjffdu any update on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8683#discussion_r47100356 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -220,6 +220,15 @@ private[spark] class TaskSchedulerImpl( } } + def killSpeculatedTask(taskId:Long) : Unit = synchronized { --- End diff -- Nit: space after `:` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8683#discussion_r47100268 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -628,6 +628,11 @@ private[spark] class TaskSetManager( // here "result.value()" just returns the value and won't block other threads. sched.dagScheduler.taskEnded( tasks(index), Success, result.value(), result.accumUpdates, info, result.metrics) +// kill other attempts of this task when speculation is enabled. --- End diff -- I am not so sure about this part, but will that be better if we moved the code after we confirmed the task succeeded? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user zjffdu commented on the pull request: https://github.com/apache/spark/pull/8683#issuecomment-161672510 @tgravescs sorry for late response, I will add unit test in the next few days. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8683#issuecomment-161671057 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8683#issuecomment-161671060 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47143/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8683#issuecomment-161671052 **[Test build #47143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47143/consoleFull)** for PR 8683 at commit [`95c9022`](https://github.com/apache/spark/commit/95c90220eb7a520b85699d6041163faa0265897e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8683#issuecomment-161670306 **[Test build #47143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47143/consoleFull)** for PR 8683 at commit [`95c9022`](https://github.com/apache/spark/commit/95c90220eb7a520b85699d6041163faa0265897e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/8683#issuecomment-161665421 what testing have you done on this? I think the DAGSchedulerSuite will need to be updated for the TAskScheduler api. It would be nice if we can add a test for this scenario as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/8683#issuecomment-161665116 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...
Github user zjffdu commented on the pull request: https://github.com/apache/spark/pull/8683#issuecomment-139149931 Thanks for the tips @dragos Push another commit with the changes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org