[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21577 Thanks for fixing this, @vanzin! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 Yeah sorry about that, my fault. I merged the fix - SPARK-22897 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user zzcclp commented on the issue: https://github.com/apache/spark/pull/21577 @vanzin @tgravescs , after merge this pr into branch-2.2, there is an error "stageAttemptNumber is not a member of org.apache.spark.TaskContext" in SparkHadoopMapRedUtil, I think it needs to merge PR-20082 first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 merged to master, 2.3, and 2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21577 > I pushed the change for that in: vanzin/spark@e6a862e I like it, it's simpler to use task id to replace stage attempt id and task attempt id. For safety we should do it in master only after this PR is merged. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 I will --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21577 So anyone wants to do the actual merging? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 +1 this is a bit of a side while looking through the scenarios I filed: https://issues.apache.org/jira/browse/SPARK-24622 . shouldn't be a problem here though with this fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 the code here lgtm, I was trying to make one more pass through all the scenarios but got stuck in meetings, will try to do it later tonight or tomorrow morning but we can always have another follow up if we find another case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92138/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92138/testReport)** for PR 21577 at commit [`264c533`](https://github.com/apache/spark/commit/264c533737410786faae24df8cb5b27218f804cd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21577 I pushed the change for that in: https://github.com/vanzin/spark/commit/e6a862ecb83c64a0ea2f5bd469bc0febe25e15ba In case anyone wants to take a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21577 I filed SPARK-24611 to track some enhancements to this part of the code that have been discussed here. Of those, I'd consider the "use task IDs instead of TaskIdentifier" as something we could potentially do here, but at the same time I don't really want to delay this patch too much. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21577 Sounds good to me (although I'm trying the change locally and unit tests are so far happy). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 > Ah, right, d'oh. I just checked about whether stages register with the coordinator, and saw the duplicate registration for the resubmitted map stage. Yeah I noticed that to but I think we should perhaps file separate jira and only do that in 2.4 and maybe 2.3.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21577 > * t2 finishes before that kill message arrives, is allowed to commit. > If that can happen it would generate a duplicate map output; but my guess (hope?) is that the map output tracker would only keep one of them. This should not happen after this PR for two reasons: a) we do not clear status until stage finishes (which should be sufficient to prevent the bug in entirety) b) In (other) cases where we allow a task to commit but then kill it (perhaps for other reasons - like user initiated kill, executor pre-emption, etc), the task failure will be recorded and the commit state for that partition will be cleared - and resubmitted task for partition will commit. There is an inherent race in (b) always - where task is killed before task completion and after output commit - that is something we cannot fix, and which is to be handled/resolved by final job commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21577 > if its a map stage then I don't expect it to be asking to commit. Ah, right, d'oh. I just checked about whether stages register with the coordinator, and saw the duplicate registration for the resubmitted map stage. I guess we could avoid that too. The scheduler does that explicitly, but if it doesn't serve any purpose, it would save some memory: ``` case s: ShuffleMapStage => outputCommitCoordinator.stageStart(stage = s.id, maxPartitionId = s.numPartitions - 1) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 > The test I added can sort of illustrate that if you look at what happens. There are two stages (map stage 2, result stage 3), and the fetch failure causes a retry of stage 3 plus a resubmission of stage 2 - which means that stage 2 is starting in the coordinator with a fresh list of committers. > > So if there's still a speculative task from stage 2's first run that hasn't been properly killed yet, it might be allowed to commit. But this being a map stage, I assume the map output tracker would take care of filtering out duplicates? if its a map stage then I don't expect it to be asking to commit. Is there a case you know that does? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21577 I was referring to a race caused by asynchronously killing speculative tasks. Granted it's incredibly unlikely to occur in real life: - in s1a1 1, t1 and t2 are started for the same partition, t1 succeeds, a kill is sent for t2 - s1 finishes, coordinator state is cleared for that stage - s2a1 fails, causes s1 to be re-submitted - t2 finishes before that kill message arrives, is allowed to commit. If that can happen it would generate a duplicate map output; but my guess (hope?) is that the map output tracker would only keep one of them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 > * fixed the issue Mridul brought up, but I think the race that Tom describes still exists. I'm just not sure it would cause problems, since as far as I can tell it can only happen in a map stage, not a result stage. @vanzin which race were you referring to here, I think tracking the stage across attempts fixes both the ones I mentioned in reference to scenario 2 for Mridul > There is another case though here where T1_1.1 could have just asked to be committed, but not yet committed, then if it gets delayed committing, the new stage attempt starts and T1_1.2 asks if it could commit and is granted, so then both try to commit at the same time causing corruption. Fixed because T1_1.2 won't be allowed to commit because we track first state attempt as committing. > The caveat there though would be if since T1_1.1 was committed, the second stage attempt could finish and call commitJob while T1_1.2 is committing since spark thinks it doesn't need to wait for T1_1.2. Anyway this seems very unlikely but we should protect against it. T1_1.2 shouldn't ever be allowed to commit since we track across the attempts so it wouldn't ever commit after the stage itself has completed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4252/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92138/testReport)** for PR 21577 at commit [`264c533`](https://github.com/apache/spark/commit/264c533737410786faae24df8cb5b27218f804cd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/356/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92109/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92109 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92109/testReport)** for PR 21577 at commit [`264c533`](https://github.com/apache/spark/commit/264c533737410786faae24df8cb5b27218f804cd). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92106/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92106/testReport)** for PR 21577 at commit [`5ece2f1`](https://github.com/apache/spark/commit/5ece2f12a820d6438146758f0e944f3b1c70d489). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92104/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92104 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92104/testReport)** for PR 21577 at commit [`0e91f1b`](https://github.com/apache/spark/commit/0e91f1b49ee4720c9b89a2090080efd7c89ccaf4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92109 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92109/testReport)** for PR 21577 at commit [`264c533`](https://github.com/apache/spark/commit/264c533737410786faae24df8cb5b27218f804cd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4233/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/337/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21577 Thanks for the changes @vanzin, looks good to me ! Ideally would have been great to test the speculative execution part as well; but that would be fairly nasty to reliably reproduce I guess. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4230/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92106/testReport)** for PR 21577 at commit [`5ece2f1`](https://github.com/apache/spark/commit/5ece2f12a820d6438146758f0e944f3b1c70d489). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/334/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92096/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92096/testReport)** for PR 21577 at commit [`adb0d18`](https://github.com/apache/spark/commit/adb0d18fa1bb43488245b7d6b7ee02d4997d6215). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21577 > I think the commit/delete thing is also an issue for existing v1 and hadoop committers as well. I took a look at that code and I agree with you. It's actually quite annoying how "task id" and "task attempt id" are used interchangeably with "task attempt number" in a number of places when they aren't the same thing. Anyway, I think that should either be a separate change or perhaps integrated with @rdblue's patch, since it's more similar to that one. The fix should actually be quite simple: `createTaskAttemptContext` should take the actual task attempt id (type `Long`) instead of the attempt number as an argument. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4226/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/332/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/331/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4225/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92104/testReport)** for PR 21577 at commit [`0e91f1b`](https://github.com/apache/spark/commit/0e91f1b49ee4720c9b89a2090080efd7c89ccaf4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21577 Also, one idea to try to fix the remaining race would be to move the commit-related state to the `org.apache.spark.scheduler.Stage` class, which is reused across attempts (even in the resubmission case I mentioned), and keep the coordinator class purely for the RPC stuff. But that would be a bigger change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/325/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21577 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4219/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21577 A few notes about the latest updates: - I reverted the `TaskCommitDenied` changes so that this patch can be backported more easily. I'm not against the change but I think it'd be better if we made it only in master, so we can postpone it. The information there is also not critical, since the end reason is generally attached to a task, which has the needed info anyway. - I fixed the issue Mridul brought up, but I think the race that Tom describes still exists. I'm just not sure it would cause problems, since as far as I can tell it can only happen in a map stage, not a result stage. The test I added can sort of illustrate that if you look at what happens. There are two stages (map stage 2, result stage 3), and the fetch failure causes a retry of stage 3 *plus* a resubmission of stage 2 - which means that stage 2 is starting in the committer with a fresh list of committers. So if there's still a speculative task from stage 2 that hasn't been properly killed, it might be allowed to commit. But this being a map stage, I assume the map output tracker would take care of filtering out duplicates? That's obviously really hard to hit, but if it can be an issue we could look at it separately. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21577 **[Test build #92096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92096/testReport)** for PR 21577 at commit [`adb0d18`](https://github.com/apache/spark/commit/adb0d18fa1bb43488245b7d6b7ee02d4997d6215). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 I'm fine with separating them but we need a jira or need to update the v2 jira to handle all cases --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21577 This in general looks good, IMO we shall focus on fixing the output commit coordinator issue in this PR, and discuss the data source issue in a separated thread. I'm OOO this week but will still look into more detail on this issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 So I think the commit/delete thing is also an issue for existing v1 and hadoop committers as well. So this doesn't fully solve the problem. spark uses a file format like (HadoopMapReduceWriteConfigUtil/HadoopMapRedWriteConfigUtil): {date}_{rddid}_{m/r}_{partitionid}_{task attempt number} I believe the same fix as the v2 would work using the taskAttemptId instead of the attemptNumber. (see In the case we have the stage failure and a second stage attempt the task attempt number could be the same and thus both tasks write to the same place. If one of them fails or is told not to commit it could delete the output which is being used by both. Need to think through all the scenarios to make sure its covered. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21577 FYI I plan to fix the mima issue later. Haven't decided whether to revert the change or just add excludes... probably the latter since it's a developer api. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org