[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68429438
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24945/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68429433
  
  [Test build #24945 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24945/consoleFull)
 for   PR 3849 at commit 
[`8c387ce`](https://github.com/apache/spark/commit/8c387ce76850caf2163dd82dc5c79b079788a921).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68432871
  
  [Test build #24955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24955/consoleFull)
 for   PR 3849 at commit 
[`0b10526`](https://github.com/apache/spark/commit/0b1052611f7fd7f71232ba3f2c0505e5711080e1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68437209
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24955/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68437206
  
  [Test build #24955 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24955/consoleFull)
 for   PR 3849 at commit 
[`0b10526`](https://github.com/apache/spark/commit/0b1052611f7fd7f71232ba3f2c0505e5711080e1).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68453172
  
Hmm, looks like this change somehow broke a PySpark Streaming test:

```
==
FAIL: Basic operation test for DStream.mapPartitions.
--
Traceback (most recent call last):
  File pyspark/streaming/tests.py, line 228, in test_mapPartitions
self._test_func(rdds, func, expected)
  File pyspark/streaming/tests.py, line 114, in _test_func
self.assertEqual(expected, result)
AssertionError: [[3, 7], [11, 15], [19, 23]] != []
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/3849

[SPARK-4014] Change TaskContext.attemptId to return attempt number instead 
of task ID

This patch modifies `TaskContext.attemptId` to return an attempt number, 
which conveys how many times a task has been attempted, instead of a taskId, 
which uniquely identifies a particular task attempt within a particular 
SparkContext.  Prior to this change, it was impossible to determine whether a 
task was being re-attempted (or was a speculative copy), which made it 
difficult to write unit tests for tasks that fail on early attempts or 
speculative tasks that complete faster than original tasks.

I've introduced a new `TaskContext.taskId` field which returns the old 
value.

Most of this patch is fairly straightforward, but there is a bit of 
trickiness related to Mesos tasks: since there's no field in MesosTaskInfo to 
encode the attemptId, I packed it into the `data` field alongside the task 
binary.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark SPARK-4014

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3849


commit fd515a56e9e66a2f7ab70187fbd463046f1cb330
Author: Josh Rosen joshro...@databricks.com
Date:   2014-12-30T22:00:46Z

Add failing test for SPARK-4014

commit 1e7a933f41936e134e07c32aaf47bbcf8938167c
Author: Josh Rosen joshro...@databricks.com
Date:   2014-12-30T22:01:21Z

[SPARK-4014] Change TaskContext.attemptId to return attempt number instead 
of task ID.

commit 9d8d4d115449f48ce1714497e16704d4f12442d8
Author: Josh Rosen joshro...@databricks.com
Date:   2014-12-30T22:06:19Z

Doc typo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3849#discussion_r22367027
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
 ---
@@ -290,13 +291,22 @@ private[spark] class MesosSchedulerBackend(
   .setType(Value.Type.SCALAR)
   
.setScalar(Value.Scalar.newBuilder().setValue(scheduler.CPUS_PER_TASK).build())
   .build()
+// Encode the attemptId as part of the data payload, since there's not 
a MesosTaskInfo field
+// to hold it:
+val data = {
--- End diff --

Here's the Mesos trickiness that I alluded to in the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3849#discussion_r22367053
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala ---
@@ -77,10 +77,13 @@ private[spark] class MesosExecutorBackend
 
   override def launchTask(d: ExecutorDriver, taskInfo: TaskInfo) {
 val taskId = taskInfo.getTaskId.getValue.toLong
+val attemptIdPlusData = taskInfo.getData.asReadOnlyByteBuffer()
--- End diff --

And here's the corresponding receive-side of the Mesos trickiness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68404316
  
/cc @koeninger, who raised this issue on the mailing list, and @yhuai, who 
filed the original JIRA issue.

Also, /cc @pwendell, @andrewor14, and @tdas for review.

Technically, this changes the behavior of `attemptId`; I'd argue that the 
old behavior was a bug, but we should consider whether we need to preserve it 
and introduce a new method which returns the attempt number.  I don't think 
there's anything particularly useful that you can do with the taskId, but maybe 
I'm mistaken.

Also, attempt number would be a better name than attempt ID, but I 
think we're kind of stuck with attemptId for binary compatibility reasons.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68404370
  
  [Test build #24908 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24908/consoleFull)
 for   PR 3849 at commit 
[`9d8d4d1`](https://github.com/apache/spark/commit/9d8d4d115449f48ce1714497e16704d4f12442d8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3849#discussion_r22367485
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -635,7 +635,7 @@ class DAGScheduler(
   val rdd = job.finalStage.rdd
   val split = rdd.partitions(job.partitions(0))
   val taskContext =
-new TaskContextImpl(job.finalStage.id, job.partitions(0), 0, true)
+new TaskContextImpl(job.finalStage.id, job.partitions(0), 0, 0, 
true)
--- End diff --

can we use named argument for the two zeros and the true?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3849#discussion_r22367512
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -44,8 +44,9 @@ import org.apache.spark.util.Utils
  */
 private[spark] abstract class Task[T](val stageId: Int, var partitionId: 
Int) extends Serializable {
 
-  final def run(attemptId: Long): T = {
-context = new TaskContextImpl(stageId, partitionId, attemptId, 
runningLocally = false)
+  final def run(taskId: Long, attemptId: Long): T = {
--- End diff --

would be great to add javadoc here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3849#discussion_r22367524
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
 ---
@@ -290,13 +291,22 @@ private[spark] class MesosSchedulerBackend(
   .setType(Value.Type.SCALAR)
   
.setScalar(Value.Scalar.newBuilder().setValue(scheduler.CPUS_PER_TASK).build())
   .build()
+// Encode the attemptId as part of the data payload, since there's not 
a MesosTaskInfo field
+// to hold it:
+val data = {
--- End diff --

can we type data explicitly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3849#discussion_r22367585
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -635,7 +635,7 @@ class DAGScheduler(
   val rdd = job.finalStage.rdd
   val split = rdd.partitions(job.partitions(0))
   val taskContext =
-new TaskContextImpl(job.finalStage.id, job.partitions(0), 0, true)
+new TaskContextImpl(job.finalStage.id, job.partitions(0), 0, 0, 
true)
--- End diff --

Good call; I didn't do this for the test code, but this line is in 
DAGScheduler so it should use named arguments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3849#discussion_r22367584
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -180,7 +189,7 @@ private[spark] class Executor(
 
 // Run the actual task and measure its runtime.
 taskStart = System.currentTimeMillis()
-val value = task.run(taskId.toInt)
+val value = task.run(taskId = taskId.toLong, attemptId = 
attemptId.toLong)
--- End diff --

the toLong's are not necessary, are they?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3849#discussion_r22367616
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -44,8 +44,9 @@ import org.apache.spark.util.Utils
  */
 private[spark] abstract class Task[T](val stageId: Int, var partitionId: 
Int) extends Serializable {
 
-  final def run(attemptId: Long): T = {
-context = new TaskContextImpl(stageId, partitionId, attemptId, 
runningLocally = false)
+  final def run(taskId: Long, attemptId: Long): T = {
--- End diff --

Are the descriptions in TaskContext okay?  If so, I'll just copy those for 
the parameters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3849#discussion_r22367600
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -180,7 +189,7 @@ private[spark] class Executor(
 
 // Run the actual task and measure its runtime.
 taskStart = System.currentTimeMillis()
-val value = task.run(taskId.toInt)
+val value = task.run(taskId = taskId.toLong, attemptId = 
attemptId.toLong)
--- End diff --

Good catch; this was a carryover from the old code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68405794
  
Thanks for this.  Most of the uses of attemptId I've seen look like they 
were assuming it meant the 0-based attempt number.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68406594
  
So personally I don't think we should change the semantics of `attemptId` 
because this has been exposed to user applications and they could silently 
break if we modify the meaning of the field (my original JIRA referred to an 
internal use of this). What it means right now is a global GUID over all 
attempts - that is a bit of an awkward definition, but I don't think it's fair 
to call this a bug - it was just a weird definition.

So I'd be in favor of deprecating this in favor of `taskAttemptId` (a new 
field) and say that it was renamed to avoid confusion. Then we can add another 
field, `attemptCount` or `attemptNum` or something to convey the more intuitive 
thing.

It will be slightly awkward, but if anyone reads the docs it should be 
obvious. In fact, we should probably spruce up the docs here for things like 
`partitionID` which right now are probably not super clear to users.

 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68407388
  
The flip side is that it's already documented as doing the right thing:


http://spark.apache.org/docs/1.1.1/api/scala/index.html#org.apache.spark.TaskContext

val attemptId: Long

the number of attempts to execute this task

On Tue, Dec 30, 2014 at 4:38 PM, Patrick Wendell notificati...@github.com
wrote:

 So personally I don't think we should change the semantics of attemptId
 because this has been exposed to user applications and they could silently
 break if we modify the meaning of the field (my original JIRA referred to
 an internal use of this). What it means right now is a global GUID over
 all attempts - that is a bit of an awkward definition, but I don't think
 it's fair to call this a bug - it was just a weird definition.

 So I'd be in favor of deprecating this in favor of taskAttemptId (a new
 field) and say that it was renamed to avoid confusion. Then we can add
 another field, attemptCount or attemptNum or something to convey the more
 intuitive thing.

 It will be slightly awkward, but if anyone reads the docs it should be
 obvious. In fact, we should probably spruce up the docs here for things
 like partitionID which right now are probably not super clear to users.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3849#issuecomment-68406594.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68407800
  
Ah I see - I didn't see the doc. I'm more on the fence in this case 
(because there was a doc that created a specification). So I guess I'm fine 
either way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68407820
  
Annoyingly, it looks like ScalaDoc doesn't display Javadoc annotations, so 
in the Scala documentation for 1.2.0 the TaskContext class appears to have lost 
all documentation, even though it still shows up in the Java docs:

- 
http://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.TaskContext
- 
https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/TaskContext.html

I don't know how the docs for `attemptId` managed to get lost during the 
TaskContext stabilization patch, but this suggests that we need a better review 
checklist for public APIs which mandates full Scaladoc / Javadoc for all public 
interfaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68408000
  
One potential consideration is backporing: if we agree that `attemptId`'s 
meaning should be changed to reflect the 1.1.1 documentation, then we should 
make the same fix in `branch-1.1`, which will require a separate patch because 
that branch uses a different implementation of TaskContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68408128
  
Hm - we probably just shouldn't backport it, again I think users might be 
depending on the hold behavior, putting it in a patch release is a bit iffy.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68408407
  
Should we at least leave a documentation note to inform users about the 
difference in behavior?  I'm worried that someone will look at the 1.2 docs, 
write some code which relies on the correct behavior, then be surprised if they 
run it on an older release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68410782
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24908/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68410774
  
  [Test build #24908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24908/consoleFull)
 for   PR 3849 at commit 
[`9d8d4d1`](https://github.com/apache/spark/commit/9d8d4d115449f48ce1714497e16704d4f12442d8).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Sort(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68412414
  
Hmm, looks like this failed MiMa checks due to the addition of a new method 
to a public interface:

```
[info] spark-core: found 1 potential binary incompatibilities (filtered 185)
[error]  * abstract method taskId()Long in class 
org.apache.spark.TaskContext does not have a correspondent in old version
[error]filter with: 
ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.TaskContext.taskId)
```

I'll add an exclusion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3849#discussion_r22372823
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -180,7 +189,7 @@ private[spark] class Executor(
 
 // Run the actual task and measure its runtime.
 taskStart = System.currentTimeMillis()
-val value = task.run(taskId.toInt)
+val value = task.run(taskId = taskId.toLong, attemptId = 
attemptId.toLong)
--- End diff --

It turns out that `attemptId` was a long but we only expected it to hold 
values that could fit in an int, hence this weird `.toLong` in a few places, 
plus  a few `% Int.MaxValue` modulus calls in various places; I've cleaned this 
up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68418485
  
  [Test build #24923 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24923/consoleFull)
 for   PR 3849 at commit 
[`cbe4d76`](https://github.com/apache/spark/commit/cbe4d76a7968c467514c290fc56ece713225970a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68421615
  
  [Test build #24923 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24923/consoleFull)
 for   PR 3849 at commit 
[`cbe4d76`](https://github.com/apache/spark/commit/cbe4d76a7968c467514c290fc56ece713225970a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68421619
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24923/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3849#discussion_r22375124
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala ---
@@ -63,6 +63,33 @@ class TaskContextSuite extends FunSuite with 
BeforeAndAfter with LocalSparkConte
 
 verify(listener, times(1)).onTaskCompletion(any())
   }
+
+  test(TaskContext.attemptNumber should return attempt number, not task 
id (SPARK-4014)) {
+sc = new SparkContext(local-cluster[2,1,512], test)
--- End diff --

Just realized that I can change the `maxFailures` property on `local` 
instead of having to use `local-cluster`.  Let me make that change, since it's 
a better practice and will speed up the tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68426611
  
  [Test build #24945 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24945/consoleFull)
 for   PR 3849 at commit 
[`8c387ce`](https://github.com/apache/spark/commit/8c387ce76850caf2163dd82dc5c79b079788a921).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org