[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-22 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/21577
  
Thanks for fixing this, @vanzin!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-22 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21577
  
Yeah sorry about that, my fault.  I merged the fix - SPARK-22897


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-21 Thread zzcclp
Github user zzcclp commented on the issue:

https://github.com/apache/spark/pull/21577
  
@vanzin @tgravescs , after merge this pr into branch-2.2, there is an error 
"stageAttemptNumber is not a member of org.apache.spark.TaskContext" in 
SparkHadoopMapRedUtil, I think it needs to merge PR-20082 first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-21 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21577
  
merged to master, 2.3, and 2.2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21577
  
> I pushed the change for that in: vanzin/spark@e6a862e

I like it, it's simpler to use task id to replace stage attempt id and task 
attempt id. For safety we should do it in master only after this PR is merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-21 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21577
  
I will 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-21 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21577
  
So anyone wants to do the actual merging?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-21 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21577
  
+1 

this is a bit of a side while looking through the scenarios I filed: 
https://issues.apache.org/jira/browse/SPARK-24622 . shouldn't be a problem here 
though with this fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21577
  
the code here lgtm, I was trying to make one more pass through all the 
scenarios but got stuck in meetings, will try to do it later tonight or 
tomorrow morning but we can always have another follow up if we find another 
case. 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92138/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21577
  
**[Test build #92138 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92138/testReport)**
 for PR 21577 at commit 
[`264c533`](https://github.com/apache/spark/commit/264c533737410786faae24df8cb5b27218f804cd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21577
  
I pushed the change for that in: 
https://github.com/vanzin/spark/commit/e6a862ecb83c64a0ea2f5bd469bc0febe25e15ba

In case anyone wants to take a look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21577
  
I filed SPARK-24611 to track some enhancements to this part of the code 
that have been discussed here. Of those, I'd consider the "use task IDs instead 
of TaskIdentifier" as something we could potentially do here, but at the same 
time I don't really want to delay this patch too much.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21577
  
Sounds good to me (although I'm trying the change locally and unit tests 
are so far happy).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21577
  
> Ah, right, d'oh. I just checked about whether stages register with the 
coordinator, and saw the duplicate registration for the resubmitted map stage.

Yeah I noticed that to but I think we should perhaps file separate jira and 
only do that in 2.4 and maybe 2.3.2 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/21577
  
> * t2 finishes before that kill message arrives, is allowed to commit.
> If that can happen it would generate a duplicate map output; but my guess 
(hope?) is that the map output tracker would only keep one of them.

This should not happen after this PR for two reasons:
a) we do not clear status until stage finishes (which should be sufficient 
to prevent the bug in entirety)
b) In (other) cases where we allow a task to commit but then kill it 
(perhaps for other reasons - like user initiated kill, executor pre-emption, 
etc), the task failure will be recorded and the commit state for that partition 
will be cleared - and resubmitted task for partition will commit.

There is an inherent race in (b) always - where task is killed before task 
completion and after output commit - that is something we cannot fix, and which 
is to be handled/resolved by final job commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21577
  
> if its a map stage then I don't expect it to be asking to commit.

Ah, right, d'oh. I just checked about whether stages register with the 
coordinator, and saw the duplicate registration for the resubmitted map stage.

I guess we could avoid that too. The scheduler does that explicitly, but if 
it doesn't serve any purpose, it would save some memory:

```
  case s: ShuffleMapStage =>
outputCommitCoordinator.stageStart(stage = s.id, maxPartitionId = 
s.numPartitions - 1)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21577
  
> The test I added can sort of illustrate that if you look at what happens. 
There are two stages (map stage 2, result stage 3), and the fetch failure 
causes a retry of stage 3 plus a resubmission of stage 2 - which means that 
stage 2 is starting in the coordinator with a fresh list of committers.
> 
> So if there's still a speculative task from stage 2's first run that 
hasn't been properly killed yet, it might be allowed to commit. But this being 
a map stage, I assume the map output tracker would take care of filtering out 
duplicates?

if its a map stage then I don't expect it to be asking to commit.  Is there 
a case you know that does? 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21577
  
I was referring to a race caused by asynchronously killing speculative 
tasks. Granted it's incredibly unlikely to occur in real life:

- in s1a1 1, t1 and t2 are started for the same partition, t1 succeeds, a 
kill is sent for t2
- s1 finishes, coordinator state is cleared for that stage
- s2a1 fails, causes s1 to be re-submitted
- t2 finishes before that kill message arrives, is allowed to commit.

If that can happen it would generate a duplicate map output; but my guess 
(hope?) is that the map output tracker would only keep one of them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21577
  
>  * fixed the issue Mridul brought up, but I think the race that Tom 
describes still exists. I'm just not sure it would cause problems, since as far 
as I can tell it can only happen in a map stage, not a result stage.

@vanzin  which race were you referring to here, I think tracking the stage 
across attempts fixes both the ones I mentioned in reference to scenario 2 for 
Mridul

> There is another case though here where T1_1.1 could have just asked to 
be committed, but not yet committed, then if it gets delayed committing, the 
new stage attempt starts and T1_1.2 asks if it could commit and is granted, so 
then both try to commit at the same time causing corruption.

Fixed because T1_1.2 won't be allowed to commit because we track first 
state attempt as committing.

> The caveat there though would be if since T1_1.1 was committed, the 
second stage attempt could finish and call commitJob while T1_1.2 is committing 
since spark thinks it doesn't need to wait for T1_1.2. Anyway this seems very 
unlikely but we should protect against it.

T1_1.2 shouldn't ever be allowed to commit since we track across the 
attempts so it wouldn't ever commit after the stage itself has completed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4252/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21577
  
**[Test build #92138 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92138/testReport)**
 for PR 21577 at commit 
[`264c533`](https://github.com/apache/spark/commit/264c533737410786faae24df8cb5b27218f804cd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/356/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-20 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21577
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92109/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21577
  
**[Test build #92109 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92109/testReport)**
 for PR 21577 at commit 
[`264c533`](https://github.com/apache/spark/commit/264c533737410786faae24df8cb5b27218f804cd).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92106/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21577
  
**[Test build #92106 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92106/testReport)**
 for PR 21577 at commit 
[`5ece2f1`](https://github.com/apache/spark/commit/5ece2f12a820d6438146758f0e944f3b1c70d489).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92104/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21577
  
**[Test build #92104 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92104/testReport)**
 for PR 21577 at commit 
[`0e91f1b`](https://github.com/apache/spark/commit/0e91f1b49ee4720c9b89a2090080efd7c89ccaf4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21577
  
**[Test build #92109 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92109/testReport)**
 for PR 21577 at commit 
[`264c533`](https://github.com/apache/spark/commit/264c533737410786faae24df8cb5b27218f804cd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4233/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/337/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/21577
  
Thanks for the changes @vanzin, looks good to me !
Ideally would have been great to test the speculative execution part as 
well; but that would be fairly nasty to reliably reproduce I guess.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4230/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21577
  
**[Test build #92106 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92106/testReport)**
 for PR 21577 at commit 
[`5ece2f1`](https://github.com/apache/spark/commit/5ece2f12a820d6438146758f0e944f3b1c70d489).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/334/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92096/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21577
  
**[Test build #92096 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92096/testReport)**
 for PR 21577 at commit 
[`adb0d18`](https://github.com/apache/spark/commit/adb0d18fa1bb43488245b7d6b7ee02d4997d6215).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21577
  
> I think the commit/delete thing is also an issue for existing v1 and 
hadoop committers as well.

I took a look at that code and I agree with you. It's actually quite 
annoying how "task id" and "task attempt id" are used interchangeably with 
"task attempt number" in a number of places when they aren't the same thing.

Anyway, I think that should either be a separate change or perhaps 
integrated with @rdblue's patch, since it's more similar to that one. The fix 
should actually be quite simple: `createTaskAttemptContext` should take the 
actual task attempt id (type `Long`) instead of the attempt number as an 
argument.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4226/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/332/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/331/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4225/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21577
  
**[Test build #92104 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92104/testReport)**
 for PR 21577 at commit 
[`0e91f1b`](https://github.com/apache/spark/commit/0e91f1b49ee4720c9b89a2090080efd7c89ccaf4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21577
  
Also, one idea to try to fix the remaining race would be to move the 
commit-related state to the `org.apache.spark.scheduler.Stage` class, which is 
reused across attempts (even in the resubmission case I mentioned), and keep 
the coordinator class purely for the RPC stuff.

But that would be a bigger change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/325/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4219/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21577
  
A few notes about the latest updates:

- I reverted the `TaskCommitDenied` changes so that this patch can be 
backported more easily. I'm not against the change but I think it'd be better 
if we made it only in master, so we can postpone it. The information there is 
also not critical, since the end reason is generally attached to a task, which 
has the needed info anyway.

- I fixed the issue Mridul brought up, but I think the race that Tom 
describes still exists. I'm just not sure it would cause problems, since as far 
as I can tell it can only happen in a map stage, not a result stage.

The test I added can sort of illustrate that if you look at what happens. 
There are two stages (map stage 2, result stage 3), and the fetch failure 
causes a retry of stage 3 *plus* a resubmission of stage 2 - which means that 
stage 2 is starting in the committer with a fresh list of committers.

So if there's still a speculative task from stage 2 that hasn't been 
properly killed, it might be allowed to commit. But this being a map stage, I 
assume the map output tracker would take care of filtering out duplicates?

That's obviously really hard to hit, but if it can be an issue we could 
look at it separately.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21577
  
**[Test build #92096 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92096/testReport)**
 for PR 21577 at commit 
[`adb0d18`](https://github.com/apache/spark/commit/adb0d18fa1bb43488245b7d6b7ee02d4997d6215).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21577
  
I'm fine with separating them but we need a jira or need to update the v2 
jira to handle all cases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21577
  
This in general looks good,  IMO we shall focus on fixing the output commit 
coordinator issue in this PR, and discuss the data source issue in a separated 
thread.
I'm OOO this week but will still look into more detail on this issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-19 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21577
  
So I think the commit/delete thing is also an issue for existing v1 and 
hadoop committers as well.  So this doesn't fully solve the problem.  spark 
uses a file format like 
(HadoopMapReduceWriteConfigUtil/HadoopMapRedWriteConfigUtil): 
{date}_{rddid}_{m/r}_{partitionid}_{task attempt number}
I believe the same fix as the v2 would work using the taskAttemptId instead 
of the attemptNumber.  (see 

In the case we have the stage failure and a second stage attempt the task 
attempt number could be the same and thus both tasks write to the same place.  
If one of them fails or is told not to commit it could delete the output which 
is being used by both.

Need to think through all the scenarios to make sure its covered. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-18 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21577
  
FYI I plan to fix the mima issue later. Haven't decided whether to revert 
the change or just add excludes... probably the latter since it's a developer 
api.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org