[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/20930 No wonder I can't understand the issue for a long time since I've thought it happened on Spark2.3 . And now it makes sense. Thanks @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 > Have you applied this patch: #17955 ? No, this happened on Spark 2.1. Thanks xingbo & wenchen, I'll port back this patch to our internal Spark 2.1. > That PR seems to be addressing the issue you described: Yeah, the description is similar with currently scenario, but there's also a puzzle about the wrong ShuffleId, I'm trying to find the reason. Thanks again for your help, I'll first port back this patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20930 Have you applied this patch: https://github.com/apache/spark/pull/17955 ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89870/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #89870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89870/testReport)** for PR 20930 at commit [`fee903c`](https://github.com/apache/spark/commit/fee903c65c59219cdc1c0937ac8be4777142ffbd). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2686/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #89870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89870/testReport)** for PR 20930 at commit [`fee903c`](https://github.com/apache/spark/commit/fee903c65c59219cdc1c0937ac8be4777142ffbd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89850/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #89850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89850/testReport)** for PR 20930 at commit [`7f8503f`](https://github.com/apache/spark/commit/7f8503f7f921568a09b967ddf75f2ce2f027e197). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89849/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #89849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89849/testReport)** for PR 20930 at commit [`a201764`](https://github.com/apache/spark/commit/a201764c94b21e294f0a32cb71019b422e8d8090). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2676/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #89850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89850/testReport)** for PR 20930 at commit [`7f8503f`](https://github.com/apache/spark/commit/7f8503f7f921568a09b967ddf75f2ce2f027e197). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2675/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #89849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89849/testReport)** for PR 20930 at commit [`a201764`](https://github.com/apache/spark/commit/a201764c94b21e294f0a32cb71019b422e8d8090). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/20930 > because we can get the MapStatus, but get a 'null'. If I'm not mistaken, this also because the ExecutorLost trigger removeOutputsOnExecutor If there's a `null` MapStatus for stage 2, how can it retry 4 times without any tasks? IIUC, `null` MapStatus leads to missing partition, which means there will be some tasks to submit. As for stage 3's shuffle Id, that's really weird. Hope you can fix it! @xuanyuanking --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 ![image](https://user-images.githubusercontent.com/4833765/39091106-ff11d0a6-461f-11e8-968f-7fcbe6652bb3.png) Stage 0\1\2\3 same with 20\21\22\23 in this screenshot, stage2's shuffleId is 1 but stage3's is 0 can't happen. Good description for the scenario, can't get a FetchFailed because we can get the MapStatus, but get a 'null'. If I'm not mistaken, this also because the ExecutorLost trigger `removeOutputsOnExecutor`. Happy to discuss with all guys and sorry for can't giving more detailed log after checking the root case, this happened in Baidu online env and can't keep all logs for 1 month. I'll keep fixing the case and catching details log as mush as possible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/20930 Hi, @xuanyuanking , thank for your patient explanation, sincerely. With regard to your latest explanation: > stage 2's shuffleID is 1, but stage 3 failed by missing an output for shuffle '0'! So here the stage 2's skip cause stage 3 got an error shuffleId. However, I don't think stage 2's skip will lead to stage 3 got an error shuffleId, as we've already created all `ShuffleDependencies ` (constructed with certain ids) for `ShuffleMapStages` before any stages of a job submitted. As I struggle for understanding this issue for a while, finally, I got my own inference: (assume the 2 ShuffleMapTasks below is belong to stage 2, and stage 2 has two partitions on map side. And stage 2 has a parent stage named stage 1, and a child stage named stage 3.) 1. ShuffleMapTask 0.0 run on ExecutorB, and write map output on ExecutorB, succeed normally. And now, there's only '1' available map output registered on `MapOutputTrackerMaster `. 2. ShuffleMapTask 1.0 is running on ExecutorA, and fetch data from ExecutorA, and write map output on ExecutorA, too. 3. ExecutorA lost for unknown reason after send `StatusUpdate` message to driver, which tells ShuffleMapTask 1.0's success. And all map outputs on ExecutorA lost, include ShuffleMapTask 1.0's map output. 4. And driver launch a speculative ShuffleMapTask 1.1 before it receives the `StatusUpdate` message. And ShuffleMapTask 1.1 get FetchFailed immediately. 5. `DAGScheduler` handle the FetchFailed ShuffleMapTask 1.1 firstly, mark stage 2 and it's parent stage 1 as failed. And stage 1 & stage 2 are waiting for resubmit. 6. `DAGScheduler ` handle the success ShuffleMapTask 1.0 before stage 1 & stage 2 resubmit, which trigger `MapOutputTrackerMaster.registerMapOutput` . And now, there's '2' available map output registered on `MapOutputTrackerMaster ` (but knowing ShuffleMapTask 1.0's map output on ExecutorA has been lost.). 7. stage 1 resubmitted and succeed normally. 8. stage 2 resubmitted. As stage 2 has '2' available map output registered on `MapOutputTrackerMaster `, so there's no missing partitions for stage 2. Thus, stage 2 has no missing tasks to submit, too. 9. And then, we submit stage 3. As stage 2's map output file lost on ExecutorA, so stage 3 must get a FetchFailed at the end. Then, we resubmit stage 2& stage 3. And then we get into a loop until stag 3 abort. But if the issue is what I described above, we should get `FetchFailedException` instead of `MetadataFetchFailedException` shown in screenshot. So, at this point which can not make sense. Please feel free to point my wrong spot out. Anyway, thanks again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 @Ngone51 Ah, maybe I know how the description misleading you, the in the description 5, 'this stage' refers to 'Stage 2' in screenshot, thanks for your check, I modified the description to avoid misleading others. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 @Ngone51 You can check the screenshot in detail, stage 2's shuffleID is 1, but stage 3 failed by missing an output for shuffle '0'! So here the stage 2's skip cause stage 3 got an error shuffleId, the root case is this patch wants to fix, missing task should have, but actually not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89479/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #89479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89479/testReport)** for PR 20930 at commit [`ba6f71a`](https://github.com/apache/spark/commit/ba6f71a0fc49ce2a07addec3496177c4b2b43fef). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/20930 Hi, @xuanyuanking , I'm still confused (smile & cry). > Stage 2 retry 4 times triggered by Stage 3's fetch failed event. Actually in this scenario, stage 3 will always failed by fetch fail. Stage 2 has no missing tasks, right? So, there's no missing partitions for Stage 2 (which means Stage 3 can always get Stage 2's MapOutputs from `MapOutputTrackerMaster` ), right? So, why Stage 3 will always failed by FetchFail? Hope you can explain more. Thank you very much! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 @Ngone51 Thanks for your review. > Does stage 2 is correspond to the never success stage in PR description ? Stage 3 is the never success stage, stage 2 is its parent stage. > So, why stage 2 retry 4 times when there's no more missing tasks? Stage 2 retry 4 times triggered by Stage 3's fetch failed event. Actually in this scenario, stage 3 will always failed by fetch fail. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2411/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2410/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #89479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89479/testReport)** for PR 20930 at commit [`ba6f71a`](https://github.com/apache/spark/commit/ba6f71a0fc49ce2a07addec3496177c4b2b43fef). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89389/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #89389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89389/testReport)** for PR 20930 at commit [`0defc09`](https://github.com/apache/spark/commit/0defc09dbcbd0b227eab583d0426b5dc78232b37). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2339/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #89389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89389/testReport)** for PR 20930 at commit [`0defc09`](https://github.com/apache/spark/commit/0defc09dbcbd0b227eab583d0426b5dc78232b37). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 @cloud-fan @jiangxb1987 Sorry for late reply, delete the useless code as our discussion before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88806/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #88806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88806/testReport)** for PR 20930 at commit [`08f6930`](https://github.com/apache/spark/commit/08f693017b01935eb2ae4f785ddb0cf9f9142125). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1895/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20930 **[Test build #88806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88806/testReport)** for PR 20930 at commit [`08f6930`](https://github.com/apache/spark/commit/08f693017b01935eb2ae4f785ddb0cf9f9142125). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20930 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 > What's your proposed fix? I fix this by killing other attempts while receive a FetchFailed in `TaskSetManager`. If we finally ignore the success event of other attempts, might as well stop the task. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20930 What's your proposed fix? it sounds like we can just ignore `ShuffleMapTask 1.0` if the stage is marked as failed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 Yeah, the stage resubmitted, but there's no missing task for this stage and actually no task will be resubmitted. This mainly because the `ShuffleMapTask 1.0` triggered `shuffleStage.addOutputLoc`. The screenshot I attached in Jira maybe help to explain this scenario. ![image](https://user-images.githubusercontent.com/4833765/38135625-c54309f2-344b-11e8-850e-9f99dc2b28a0.png) ![image](https://user-images.githubusercontent.com/4833765/38135635-d17360aa-344b-11e8-8328-a386c22f966a.png) You can see the empty ShuffleMapStage 2 retry 4 times, finally its child stage 3 failed with FetchFailed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20930 then why is it a problem? The stage should be resubmitted soon, `ShuffleMapTask 1.0` should be a no-op. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 The first case, the stage is marked as failed, but not be resubmitted yet. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20930 What happened to `ShuffleMapTask 1.0` exactly? There are 2 cases: the stage is marked as failed, but not be resubmitted yet, or the stage has been resubmitted, or the stage is aborted. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 `ShuffleMapTask 1.0` succeed after its speculative task failed by FetchFailed. Thanks for your checking, I will modify the PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20930 what happened to `ShuffleMapTask 1.0`? I don't get it from your PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org