[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-26 Thread Ngone51
Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/20930
  
No wonder I can't understand the issue for a long time since I've thought 
it happened on Spark2.3 . And now it makes sense. Thanks @jiangxb1987 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-26 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
> Have you applied this patch: #17955 ?

No, this happened on Spark 2.1. Thanks xingbo & wenchen, I'll port back 
this patch to our internal Spark 2.1.

> That PR seems to be addressing the issue you described:

Yeah, the description is similar with currently scenario, but there's also 
a puzzle about the wrong ShuffleId, I'm trying to find the reason. Thanks again 
for your help, I'll first port back this patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-26 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20930
  
Have you applied this patch: https://github.com/apache/spark/pull/17955 ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89870/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #89870 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89870/testReport)**
 for PR 20930 at commit 
[`fee903c`](https://github.com/apache/spark/commit/fee903c65c59219cdc1c0937ac8be4777142ffbd).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2686/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #89870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89870/testReport)**
 for PR 20930 at commit 
[`fee903c`](https://github.com/apache/spark/commit/fee903c65c59219cdc1c0937ac8be4777142ffbd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89850/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #89850 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89850/testReport)**
 for PR 20930 at commit 
[`7f8503f`](https://github.com/apache/spark/commit/7f8503f7f921568a09b967ddf75f2ce2f027e197).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89849/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #89849 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89849/testReport)**
 for PR 20930 at commit 
[`a201764`](https://github.com/apache/spark/commit/a201764c94b21e294f0a32cb71019b422e8d8090).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2676/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #89850 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89850/testReport)**
 for PR 20930 at commit 
[`7f8503f`](https://github.com/apache/spark/commit/7f8503f7f921568a09b967ddf75f2ce2f027e197).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2675/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #89849 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89849/testReport)**
 for PR 20930 at commit 
[`a201764`](https://github.com/apache/spark/commit/a201764c94b21e294f0a32cb71019b422e8d8090).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-21 Thread Ngone51
Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/20930
  
> because we can get the MapStatus, but get a 'null'. If I'm not mistaken, 
this also because the ExecutorLost trigger removeOutputsOnExecutor

If there's a `null` MapStatus for stage 2, how can it retry 4 times without 
any tasks? IIUC, `null` MapStatus leads to missing partition, which means there 
will be some tasks to submit.

As for stage 3's shuffle Id, that's really weird. Hope you can fix it! 
@xuanyuanking 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-21 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  

![image](https://user-images.githubusercontent.com/4833765/39091106-ff11d0a6-461f-11e8-968f-7fcbe6652bb3.png)

Stage 0\1\2\3 same with 20\21\22\23 in this screenshot, stage2's shuffleId 
is 1 but stage3's is 0 can't happen.

Good description for the scenario, can't get a FetchFailed because we can 
get the MapStatus, but get a 'null'. If I'm not mistaken, this also because the 
ExecutorLost trigger `removeOutputsOnExecutor`.

Happy to discuss with all guys and sorry for can't giving more detailed log 
after checking the root case, this happened in Baidu online env and can't keep 
all logs for 1 month. I'll keep fixing the case and catching details log as 
mush as possible.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-21 Thread Ngone51
Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/20930
  
Hi, @xuanyuanking , thank for your patient explanation, sincerely.

With regard to your latest explanation:
 
> stage 2's shuffleID is 1, but stage 3 failed by missing an output for 
shuffle '0'! So here the stage 2's skip cause stage 3 got an error shuffleId.

However, I don't think stage 2's skip will lead to stage 3 got an error 
shuffleId, as we've already created all `ShuffleDependencies ` (constructed 
with certain ids) for `ShuffleMapStages` before any stages of a job submitted. 

As I struggle for understanding this issue for a while,  finally, I got my 
own inference:

(assume the 2 ShuffleMapTasks below is belong to stage 2, and stage 2 has 
two partitions on map side. And stage 2 has a parent stage named stage 1, and a 
child stage named stage 3.)

1. ShuffleMapTask 0.0 run on ExecutorB,  and write map output on ExecutorB, 
 succeed normally.
And now, there's only '1' available map output registered on 
`MapOutputTrackerMaster `.

2. ShuffleMapTask 1.0 is running on ExecutorA, and fetch data from 
ExecutorA, and write map output on ExecutorA, too.

3. ExecutorA lost for unknown reason after send `StatusUpdate` message to 
driver, which tells ShuffleMapTask 1.0's success. And all map outputs on 
ExecutorA lost, include ShuffleMapTask 1.0's map output.

4. And driver launch a speculative ShuffleMapTask 1.1 before it receives 
the `StatusUpdate` message. And ShuffleMapTask 1.1 get FetchFailed immediately.

5. `DAGScheduler` handle the FetchFailed ShuffleMapTask 1.1 firstly, mark 
stage 2 and it's parent stage 1 as failed. And stage 1 & stage 2 are waiting 
for resubmit.

6. `DAGScheduler ` handle the success ShuffleMapTask 1.0 before stage 1 & 
stage 2 resubmit, which trigger `MapOutputTrackerMaster.registerMapOutput` . 
And now, there's '2' available map output registered on `MapOutputTrackerMaster 
` (but knowing ShuffleMapTask 1.0's map output on ExecutorA has been lost.).

7. stage 1 resubmitted and succeed normally.

8. stage 2 resubmitted. As stage 2 has '2' available map output registered 
on `MapOutputTrackerMaster `, so there's no missing partitions for stage 2. 
Thus, stage 2 has no missing tasks to submit, too. 

9. And then, we submit stage 3. As stage 2's map output file lost on 
ExecutorA, so stage 3 must get a FetchFailed at the end. Then, we resubmit 
stage 2& stage 3.  And then we get into a loop until stag 3 abort.

But if the issue is what I described above, we should get 
`FetchFailedException` instead of `MetadataFetchFailedException`  shown in 
screenshot.  So, at this point which can not make sense. 

Please feel free to point my wrong spot out.

Anyway, thanks again.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-20 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
@Ngone51 Ah, maybe I know how the description misleading you, the in the 
description 5, 'this stage' refers to 'Stage 2' in screenshot, thanks for your 
check, I modified the description to avoid misleading others.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-20 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
@Ngone51 
You can check the screenshot in detail, stage 2's shuffleID is 1, but stage 
3 failed by missing an output for shuffle '0'! So here the stage 2's skip cause 
stage 3 got an error shuffleId, the root case is this patch wants to fix, 
missing task should have, but actually not. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89479/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #89479 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89479/testReport)**
 for PR 20930 at commit 
[`ba6f71a`](https://github.com/apache/spark/commit/ba6f71a0fc49ce2a07addec3496177c4b2b43fef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread Ngone51
Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/20930
  
Hi, @xuanyuanking , I'm still confused (smile & cry). 
> Stage 2 retry 4 times triggered by Stage 3's fetch failed event. Actually 
in this scenario, stage 3 will always failed by fetch fail.

Stage 2 has no missing tasks, right? So,  there's no missing partitions for 
Stage 2 (which means Stage 3 can always get Stage 2's MapOutputs from 
`MapOutputTrackerMaster` ), right? So, why  Stage 3 will always failed by 
FetchFail?
 
Hope you can explain more. Thank you very much!



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
@Ngone51 Thanks for your review.
> Does stage 2 is correspond to the never success stage in PR description ?

Stage 3 is the never success stage, stage 2 is its parent stage.

> So, why stage 2 retry 4 times when there's no more missing tasks?

Stage 2 retry 4 times triggered by Stage 3's fetch failed event. Actually 
in this scenario, stage 3 will always failed by fetch fail.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2411/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2410/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #89479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89479/testReport)**
 for PR 20930 at commit 
[`ba6f71a`](https://github.com/apache/spark/commit/ba6f71a0fc49ce2a07addec3496177c4b2b43fef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89389/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #89389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89389/testReport)**
 for PR 20930 at commit 
[`0defc09`](https://github.com/apache/spark/commit/0defc09dbcbd0b227eab583d0426b5dc78232b37).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2339/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #89389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89389/testReport)**
 for PR 20930 at commit 
[`0defc09`](https://github.com/apache/spark/commit/0defc09dbcbd0b227eab583d0426b5dc78232b37).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-16 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-16 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
@cloud-fan @jiangxb1987 
Sorry for late reply, delete the useless code as our discussion before. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88806/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #88806 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88806/testReport)**
 for PR 20930 at commit 
[`08f6930`](https://github.com/apache/spark/commit/08f693017b01935eb2ae4f785ddb0cf9f9142125).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1895/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #88806 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88806/testReport)**
 for PR 20930 at commit 
[`08f6930`](https://github.com/apache/spark/commit/08f693017b01935eb2ae4f785ddb0cf9f9142125).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-03-31 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
> What's your proposed fix?
I fix this by killing other attempts while receive a FetchFailed in 
`TaskSetManager`. If we finally ignore the success event of other attempts, 
might as well stop the task.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org




[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-03-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20930
  
What's your proposed fix? it sounds like we can just ignore `ShuffleMapTask 
1.0` if the stage is marked as failed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-03-30 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
Yeah, the stage resubmitted, but there's no missing task for this stage and 
actually no task will be resubmitted. This mainly because the `ShuffleMapTask 
1.0` triggered `shuffleStage.addOutputLoc`.
The screenshot I attached in Jira maybe help to explain this scenario.

![image](https://user-images.githubusercontent.com/4833765/38135625-c54309f2-344b-11e8-850e-9f99dc2b28a0.png)

![image](https://user-images.githubusercontent.com/4833765/38135635-d17360aa-344b-11e8-8328-a386c22f966a.png)
You can see the empty ShuffleMapStage 2 retry 4 times, finally its child 
stage 3 failed with FetchFailed.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-03-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20930
  
then why is it a problem? The stage should be resubmitted soon, 
`ShuffleMapTask 1.0` should be a no-op.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-03-30 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
The first case, the stage is marked as failed, but not be resubmitted yet.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-03-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20930
  
What happened to `ShuffleMapTask 1.0` exactly? There are 2 cases: the stage 
is marked as failed, but not be resubmitted yet, or the stage has been 
resubmitted, or the stage is aborted.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-03-30 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
`ShuffleMapTask 1.0` succeed after its speculative task failed by 
FetchFailed. Thanks for your checking, I will modify the PR description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-03-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20930
  
what happened to `ShuffleMapTask 1.0`? I don't get it from your PR 
description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org