[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-218197910 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58236/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-218197909 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-218197559 **[Test build #58236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58236/consoleFull)** for PR 8927 at commit [`3bf1eaa`](https://github.com/apache/spark/commit/3bf1eaaddc3661241b7558abd4f74cc3173aba34). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-218192384 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-218192387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58235/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-218192114 **[Test build #58235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58235/consoleFull)** for PR 8927 at commit [`743a1e6`](https://github.com/apache/spark/commit/743a1e62b4a91bddeec2d3b3a48a3c0843c58d73). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-218153922 **[Test build #58236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58236/consoleFull)** for PR 8927 at commit [`3bf1eaa`](https://github.com/apache/spark/commit/3bf1eaaddc3661241b7558abd4f74cc3173aba34). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-218151502 **[Test build #58235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58235/consoleFull)** for PR 8927 at commit [`743a1e6`](https://github.com/apache/spark/commit/743a1e62b4a91bddeec2d3b3a48a3c0843c58d73). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-213389098 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56692/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-213389096 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-213388941 **[Test build #56692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56692/consoleFull)** for PR 8927 at commit [`259698e`](https://github.com/apache/spark/commit/259698e65c3fa78d06a74e3a2e211a7c5c2ce66e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-213364058 **[Test build #56692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56692/consoleFull)** for PR 8927 at commit [`259698e`](https://github.com/apache/spark/commit/259698e65c3fa78d06a74e3a2e211a7c5c2ce66e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-213363612 @squito @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user suyanNone commented on a diff in the pull request: https://github.com/apache/spark/pull/8927#discussion_r60716314 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -1083,8 +1085,6 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with Timeou Success, makeMapStatus("hostA", reduceRdd.partitions.size))) assert(shuffleStage.numAvailableOutputs === 2) -assert(mapOutputTracker.getMapSizesByExecutorId(shuffleId, 0).map(_._1).toSet === --- End diff -- For running stage , executor lost will not register outputlocs in this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-213362011 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-213362013 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56691/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-213362001 **[Test build #56691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56691/consoleFull)** for PR 8927 at commit [`70af484`](https://github.com/apache/spark/commit/70af48497caa28825c1828b9f8d6b635b7e23c7b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user suyanNone commented on a diff in the pull request: https://github.com/apache/spark/pull/8927#discussion_r60715521 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1416,6 +1449,7 @@ class DAGScheduler( outputCommitCoordinator.stageEnd(stage.id) listenerBus.post(SparkListenerStageCompleted(stage.latestInfo)) +taskScheduler.zombieTasks(stage.id) --- End diff -- Once stage was finished, it should make previous taskset Zombie --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-213361689 **[Test build #56691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56691/consoleFull)** for PR 8927 at commit [`70af484`](https://github.com/apache/spark/commit/70af48497caa28825c1828b9f8d6b635b7e23c7b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-212335488 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-212335401 **[Test build #56345 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56345/consoleFull)** for PR 8927 at commit [`fb478bb`](https://github.com/apache/spark/commit/fb478bb389e1ad021c24306e6718f645f6d8dd10). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-212335494 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56345/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-212331847 **[Test build #56345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56345/consoleFull)** for PR 8927 at commit [`fb478bb`](https://github.com/apache/spark/commit/fb478bb389e1ad021c24306e6718f645f6d8dd10). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-212324011 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56344/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-212324005 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-212323979 **[Test build #56344 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56344/consoleFull)** for PR 8927 at commit [`1be6071`](https://github.com/apache/spark/commit/1be6071956c6a93e2d264fb1c2db92d01c4d3fe4). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-212322683 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-212322883 **[Test build #56344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56344/consoleFull)** for PR 8927 at commit [`1be6071`](https://github.com/apache/spark/commit/1be6071956c6a93e2d264fb1c2db92d01c4d3fe4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
GitHub user suyanNone reopened a pull request: https://github.com/apache/spark/pull/8927 [SPARK-10796][CORE]Resubmit stage while lost task in Zombie TaskSets We meet that problem in Spark 1.3.0, and I also reproduce on the latest version. desc: 1. We know a running `ShuffleMapStage` will have multiple `TaskSet`: one Active TaskSet, multiple Zombie TaskSet. 2. We think a running `ShuffleMapStage` is success only if its partition are all process success, namely each taskâs MapStatus are all add into `outputLocs` 3. MapStatus of running `ShuffleMapStage` may succeed by Zombie TaskSet1 / Zombie TaskSet2 // Active TaskSetN, and may some MapStatus only belong to one TaskSet, and may be a Zombie TaskSet. 4. If lost a executor, it chanced that some lost-executor related MapStatus are succeed by some Zombie TaskSet. In current logical, The solution to resolved that lost MapStatus problem is, each TaskSet re-running that those tasks which succeed in lost-executor: re-add into `TaskSet's pendingTasks`, and re-add it paritions into `Stageâs pendingPartitions` . but it is useless if that lost MapStatus only belong to Zombie TaskSet, it is Zombie, so will never be scheduled his `pendingTasks` 5. The condition for resubmit stage is only if some task throws `FetchFailedException`, but may the lost-executor just not empty any MapStatus of parent Stage for one of running Stages, and itâs happen to that running `Stage` was lost a MapStatus only belong to a ZombieTask. So if all Zombie TaskSets are all processed his runningTasks and Active TaskSet are all processed his pendingTask, then will removed by `TaskSchedulerImp`, then that running Stage's pending partitions is still nonEmpty. it will hangs.. Examples: Running Stage 0.0, running TaskSet0.0, Finshed task0.0 in ExecA, running Task1.0 in ExecB, waiting Task2.0 ---> Task1.0 throws FetchFailedException ---> Running Resubmited stage 0.1, running TaskSet0.1(which re-run Task1, Task2), assume Task 1.0 finshed in ExecA ---> ExecA lost, and it happens no one throw FetchFailedExecption. ---> TaskSet0.1 re-submit task 1, re-add it into pendingTasks, and waiting TaskSchedulerImp schedule. TaskSet 0.0 also resubmit task0, re-add it into pendingTasks, because itâs Zombie, TaskSchedulerImpl skip to schedule TaskSet0.0 So if TaskSet0.0 and TaskSet0.1 (isZombie && runningTasks.empty), TaskSchedulerImp will remove those TaskSets. DagScheduler still have pendingPartitions due to the task lost in TaskSet0.0, but his TaskSets are all removed, so hangs You can merge this pull request into a Git repository by running: $ git pull https://github.com/suyanNone/spark rerun-special Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8927.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8927 commit 1be6071956c6a93e2d264fb1c2db92d01c4d3fe4 Author: hushanDate: 2016-04-20T08:21:51Z Fix zombieTasksets and RemovedTaskset lost output --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user suyanNone closed the pull request at: https://github.com/apache/spark/pull/8927 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143774744 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143774601 [Test build #43060 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43060/console) for PR 8927 at commit [`ce83c9b`](https://github.com/apache/spark/commit/ce83c9b565fe77591750c895f0e657d2f42cf851). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143774745 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43060/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-14378 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143723906 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143723901 [Test build #43059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43059/console) for PR 8927 at commit [`301da0a`](https://github.com/apache/spark/commit/301da0a20c94084bc8f783cd0e087e63f07e2124). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143723907 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43059/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143732059 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143732034 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143722412 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143722397 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143723400 [Test build #43059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43059/consoleFull) for PR 8927 at commit [`301da0a`](https://github.com/apache/spark/commit/301da0a20c94084bc8f783cd0e087e63f07e2124). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143698654 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143698641 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143699525 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43057/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143699519 [Test build #43057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43057/console) for PR 8927 at commit [`301da0a`](https://github.com/apache/spark/commit/301da0a20c94084bc8f783cd0e087e63f07e2124). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143699522 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143698080 Reproduce that, so re-open that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
GitHub user suyanNone reopened a pull request: https://github.com/apache/spark/pull/8927 [SPARK-10796][CORE]Resubmit stage while lost task in Zombie TaskSets We meet that problem in Spark 1.3.0, and I also check the latest Spark code, and I think that problem still exist. desc: 1. We know a running `ShuffleMapStage` will have multiple `TaskSet`: one Active TaskSet, multiple Zombie TaskSet. 2. We think a running `ShuffleMapStage` is success only if its partition are all process success, namely each taskâs MapStatus are all add into `outputLocs` 3. MapStatus of running `ShuffleMapStage` may succeed by Zombie TaskSet1 / Zombie TaskSet2 // Active TaskSetN, and may some MapStatus only belong to one TaskSet, and may be a Zombie TaskSet. 4. If lost a executor, it chanced that some lost-executor related MapStatus are succeed by some Zombie TaskSet. In current logical, The solution to resolved that lost MapStatus problem is, each TaskSet re-running that those tasks which succeed in lost-executor: re-add into `TaskSet's pendingTasks`, and re-add it paritions into `Stageâs pendingPartitions` . but it is useless if that lost MapStatus only belong to Zombie TaskSet, it is Zombie, so will never be scheduled his `pendingTasks` 5. The condition for resubmit stage is only if some task throws `FetchFailedException`, but may the lost-executor just not empty any MapStatus of parent Stage for one of running Stages, and itâs happen to that running `Stage` was lost a MapStatus only belong to a ZombieTask. So if all Zombie TaskSets are all processed his runningTasks and Active TaskSet are all processed his pendingTask, then will removed by `TaskSchedulerImp`, then that running Stage's pending partitions is still nonEmpty. it will hangs.. Examples: Running Stage 0.0, running TaskSet0.0, Finshed task0.0 in ExecA, running Task1.0 in ExecB, waiting Task2.0 ---> Task1.0 throws FetchFailedException ---> Running Resubmited stage 0.1, running TaskSet0.1(which re-run Task1, Task2), assume Task 1.0 finshed in ExecA ---> ExecA lost, and it happens no one throw FetchFailedExecption. ---> TaskSet0.1 re-submit task 1, re-add it into pendingTasks, and waiting TaskSchedulerImp schedule. TaskSet 0.0 also resubmit task0, re-add it into pendingTasks, because itâs Zombie, TaskSchedulerImpl skip to schedule TaskSet0.0 So if TaskSet0.0 and TaskSet0.1 (isZombie && runningTasks.empty), TaskSchedulerImp will remove those TaskSets. DagScheduler still have pendingPartitions due to the task lost in TaskSet0.0, but his TaskSets are all removed, so hangs You can merge this pull request into a Git repository by running: $ git pull https://github.com/suyanNone/spark rerun-special Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8927.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8927 commit 554c61f800c6c1b25b1002a7255569a9c38e4154 Author: hushanDate: 2015-09-24T09:49:22Z rerun-specail commit 3b4a683d23f951082df0b9d29dfa094683d235ea Author: hushan Date: 2015-09-28T03:09:05Z refine commit f845f33563623a9f3d6858aba893ed8c75453403 Author: hushan Date: 2015-09-28T03:14:18Z refine commit 301da0a20c94084bc8f783cd0e087e63f07e2124 Author: hushan Date: 2015-09-28T03:18:46Z refine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143699213 [Test build #43057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43057/consoleFull) for PR 8927 at commit [`301da0a`](https://github.com/apache/spark/commit/301da0a20c94084bc8f783cd0e087e63f07e2124). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143734029 [Test build #43060 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43060/consoleFull) for PR 8927 at commit [`ce83c9b`](https://github.com/apache/spark/commit/ce83c9b565fe77591750c895f0e657d2f42cf851). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
GitHub user suyanNone opened a pull request: https://github.com/apache/spark/pull/8927 [SPARK-10796][CORE]Resubmit stage while lost task in Zombie TaskSets We meet that problem in Spark 1.3.0, and I also check the latest Spark code, and I think that problem still exist. 1. We know a running `ShuffleMapStage` will have multiple `TaskSet`: one Active TaskSet, multiple Zombie TaskSet. 2. We think a running `ShuffleMapStage` is success only if its partition are all process success, namely each taskâs MapStatus are all add into `outputLocs` 3. MapStatus of running `ShuffleMapStage` may succeed by Zombie TaskSet1 / Zombie TaskSet2 // Active TaskSetN, and may some MapStatus only belong to one TaskSet, and may be a Zombie TaskSet. 4. If lost a executor, it chanced that some lost-executor related MapStatus are succeed by some Zombie TaskSet. In current logical, The solution to resolved that lost MapStatus problem is, each TaskSet re-running that those tasks which succeed in lost-executor: re-add into `TaskSet's pendingTasks`, and re-add it paritions into `Stageâs pendingPartitions` . but it is useless if that lost MapStatus only belong to Zombie TaskSet, it is Zombie, so will never be scheduled his `pendingTasks` 5. The condition for resubmit stage is only if some task throws `FetchFailedException`, but may the lost-executor just not empty any MapStatus of parent Stage for one of running Stages, and itâs happen to that running `Stage` was lost a MapStatus only belong to a ZombieTask. So if all Zombie TaskSets are all processed his runningTasks and Active TaskSet are all processed his pendingTask, then will removed by `TaskSchedulerImp`, then that running Stage's pending partitions is still nonEmpty. it will hangs.. You can merge this pull request into a Git repository by running: $ git pull https://github.com/suyanNone/spark rerun-special Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8927.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8927 commit 554c61f800c6c1b25b1002a7255569a9c38e4154 Author: hushanDate: 2015-09-24T09:49:22Z rerun-specail commit 3b4a683d23f951082df0b9d29dfa094683d235ea Author: hushan Date: 2015-09-28T03:09:05Z refine commit f845f33563623a9f3d6858aba893ed8c75453403 Author: hushan Date: 2015-09-28T03:14:18Z refine commit 301da0a20c94084bc8f783cd0e087e63f07e2124 Author: hushan Date: 2015-09-28T03:18:46Z refine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user suyanNone closed the pull request at: https://github.com/apache/spark/pull/8927 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10796][CORE]Resubmit stage while lost t...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/8927#issuecomment-143632342 I will run a test job on the latest code, to confirm that problem exist or not... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org