[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68429438 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24945/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68429433 [Test build #24945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24945/consoleFull) for PR 3849 at commit [`8c387ce`](https://github.com/apache/spark/commit/8c387ce76850caf2163dd82dc5c79b079788a921). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68432871 [Test build #24955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24955/consoleFull) for PR 3849 at commit [`0b10526`](https://github.com/apache/spark/commit/0b1052611f7fd7f71232ba3f2c0505e5711080e1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68437209 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24955/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68437206 [Test build #24955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24955/consoleFull) for PR 3849 at commit [`0b10526`](https://github.com/apache/spark/commit/0b1052611f7fd7f71232ba3f2c0505e5711080e1). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68453172 Hmm, looks like this change somehow broke a PySpark Streaming test: ``` == FAIL: Basic operation test for DStream.mapPartitions. -- Traceback (most recent call last): File pyspark/streaming/tests.py, line 228, in test_mapPartitions self._test_func(rdds, func, expected) File pyspark/streaming/tests.py, line 114, in _test_func self.assertEqual(expected, result) AssertionError: [[3, 7], [11, 15], [19, 23]] != [] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/3849 [SPARK-4014] Change TaskContext.attemptId to return attempt number instead of task ID This patch modifies `TaskContext.attemptId` to return an attempt number, which conveys how many times a task has been attempted, instead of a taskId, which uniquely identifies a particular task attempt within a particular SparkContext. Prior to this change, it was impossible to determine whether a task was being re-attempted (or was a speculative copy), which made it difficult to write unit tests for tasks that fail on early attempts or speculative tasks that complete faster than original tasks. I've introduced a new `TaskContext.taskId` field which returns the old value. Most of this patch is fairly straightforward, but there is a bit of trickiness related to Mesos tasks: since there's no field in MesosTaskInfo to encode the attemptId, I packed it into the `data` field alongside the task binary. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark SPARK-4014 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3849.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3849 commit fd515a56e9e66a2f7ab70187fbd463046f1cb330 Author: Josh Rosen joshro...@databricks.com Date: 2014-12-30T22:00:46Z Add failing test for SPARK-4014 commit 1e7a933f41936e134e07c32aaf47bbcf8938167c Author: Josh Rosen joshro...@databricks.com Date: 2014-12-30T22:01:21Z [SPARK-4014] Change TaskContext.attemptId to return attempt number instead of task ID. commit 9d8d4d115449f48ce1714497e16704d4f12442d8 Author: Josh Rosen joshro...@databricks.com Date: 2014-12-30T22:06:19Z Doc typo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3849#discussion_r22367027 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala --- @@ -290,13 +291,22 @@ private[spark] class MesosSchedulerBackend( .setType(Value.Type.SCALAR) .setScalar(Value.Scalar.newBuilder().setValue(scheduler.CPUS_PER_TASK).build()) .build() +// Encode the attemptId as part of the data payload, since there's not a MesosTaskInfo field +// to hold it: +val data = { --- End diff -- Here's the Mesos trickiness that I alluded to in the PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3849#discussion_r22367053 --- Diff: core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala --- @@ -77,10 +77,13 @@ private[spark] class MesosExecutorBackend override def launchTask(d: ExecutorDriver, taskInfo: TaskInfo) { val taskId = taskInfo.getTaskId.getValue.toLong +val attemptIdPlusData = taskInfo.getData.asReadOnlyByteBuffer() --- End diff -- And here's the corresponding receive-side of the Mesos trickiness. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68404316 /cc @koeninger, who raised this issue on the mailing list, and @yhuai, who filed the original JIRA issue. Also, /cc @pwendell, @andrewor14, and @tdas for review. Technically, this changes the behavior of `attemptId`; I'd argue that the old behavior was a bug, but we should consider whether we need to preserve it and introduce a new method which returns the attempt number. I don't think there's anything particularly useful that you can do with the taskId, but maybe I'm mistaken. Also, attempt number would be a better name than attempt ID, but I think we're kind of stuck with attemptId for binary compatibility reasons. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68404370 [Test build #24908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24908/consoleFull) for PR 3849 at commit [`9d8d4d1`](https://github.com/apache/spark/commit/9d8d4d115449f48ce1714497e16704d4f12442d8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3849#discussion_r22367485 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -635,7 +635,7 @@ class DAGScheduler( val rdd = job.finalStage.rdd val split = rdd.partitions(job.partitions(0)) val taskContext = -new TaskContextImpl(job.finalStage.id, job.partitions(0), 0, true) +new TaskContextImpl(job.finalStage.id, job.partitions(0), 0, 0, true) --- End diff -- can we use named argument for the two zeros and the true? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3849#discussion_r22367512 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -44,8 +44,9 @@ import org.apache.spark.util.Utils */ private[spark] abstract class Task[T](val stageId: Int, var partitionId: Int) extends Serializable { - final def run(attemptId: Long): T = { -context = new TaskContextImpl(stageId, partitionId, attemptId, runningLocally = false) + final def run(taskId: Long, attemptId: Long): T = { --- End diff -- would be great to add javadoc here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3849#discussion_r22367524 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala --- @@ -290,13 +291,22 @@ private[spark] class MesosSchedulerBackend( .setType(Value.Type.SCALAR) .setScalar(Value.Scalar.newBuilder().setValue(scheduler.CPUS_PER_TASK).build()) .build() +// Encode the attemptId as part of the data payload, since there's not a MesosTaskInfo field +// to hold it: +val data = { --- End diff -- can we type data explicitly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3849#discussion_r22367585 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -635,7 +635,7 @@ class DAGScheduler( val rdd = job.finalStage.rdd val split = rdd.partitions(job.partitions(0)) val taskContext = -new TaskContextImpl(job.finalStage.id, job.partitions(0), 0, true) +new TaskContextImpl(job.finalStage.id, job.partitions(0), 0, 0, true) --- End diff -- Good call; I didn't do this for the test code, but this line is in DAGScheduler so it should use named arguments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3849#discussion_r22367584 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -180,7 +189,7 @@ private[spark] class Executor( // Run the actual task and measure its runtime. taskStart = System.currentTimeMillis() -val value = task.run(taskId.toInt) +val value = task.run(taskId = taskId.toLong, attemptId = attemptId.toLong) --- End diff -- the toLong's are not necessary, are they? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3849#discussion_r22367616 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -44,8 +44,9 @@ import org.apache.spark.util.Utils */ private[spark] abstract class Task[T](val stageId: Int, var partitionId: Int) extends Serializable { - final def run(attemptId: Long): T = { -context = new TaskContextImpl(stageId, partitionId, attemptId, runningLocally = false) + final def run(taskId: Long, attemptId: Long): T = { --- End diff -- Are the descriptions in TaskContext okay? If so, I'll just copy those for the parameters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3849#discussion_r22367600 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -180,7 +189,7 @@ private[spark] class Executor( // Run the actual task and measure its runtime. taskStart = System.currentTimeMillis() -val value = task.run(taskId.toInt) +val value = task.run(taskId = taskId.toLong, attemptId = attemptId.toLong) --- End diff -- Good catch; this was a carryover from the old code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68405794 Thanks for this. Most of the uses of attemptId I've seen look like they were assuming it meant the 0-based attempt number. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68406594 So personally I don't think we should change the semantics of `attemptId` because this has been exposed to user applications and they could silently break if we modify the meaning of the field (my original JIRA referred to an internal use of this). What it means right now is a global GUID over all attempts - that is a bit of an awkward definition, but I don't think it's fair to call this a bug - it was just a weird definition. So I'd be in favor of deprecating this in favor of `taskAttemptId` (a new field) and say that it was renamed to avoid confusion. Then we can add another field, `attemptCount` or `attemptNum` or something to convey the more intuitive thing. It will be slightly awkward, but if anyone reads the docs it should be obvious. In fact, we should probably spruce up the docs here for things like `partitionID` which right now are probably not super clear to users. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68407388 The flip side is that it's already documented as doing the right thing: http://spark.apache.org/docs/1.1.1/api/scala/index.html#org.apache.spark.TaskContext val attemptId: Long the number of attempts to execute this task On Tue, Dec 30, 2014 at 4:38 PM, Patrick Wendell notificati...@github.com wrote: So personally I don't think we should change the semantics of attemptId because this has been exposed to user applications and they could silently break if we modify the meaning of the field (my original JIRA referred to an internal use of this). What it means right now is a global GUID over all attempts - that is a bit of an awkward definition, but I don't think it's fair to call this a bug - it was just a weird definition. So I'd be in favor of deprecating this in favor of taskAttemptId (a new field) and say that it was renamed to avoid confusion. Then we can add another field, attemptCount or attemptNum or something to convey the more intuitive thing. It will be slightly awkward, but if anyone reads the docs it should be obvious. In fact, we should probably spruce up the docs here for things like partitionID which right now are probably not super clear to users. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3849#issuecomment-68406594. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68407800 Ah I see - I didn't see the doc. I'm more on the fence in this case (because there was a doc that created a specification). So I guess I'm fine either way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68407820 Annoyingly, it looks like ScalaDoc doesn't display Javadoc annotations, so in the Scala documentation for 1.2.0 the TaskContext class appears to have lost all documentation, even though it still shows up in the Java docs: - http://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.TaskContext - https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/TaskContext.html I don't know how the docs for `attemptId` managed to get lost during the TaskContext stabilization patch, but this suggests that we need a better review checklist for public APIs which mandates full Scaladoc / Javadoc for all public interfaces. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68408000 One potential consideration is backporing: if we agree that `attemptId`'s meaning should be changed to reflect the 1.1.1 documentation, then we should make the same fix in `branch-1.1`, which will require a separate patch because that branch uses a different implementation of TaskContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68408128 Hm - we probably just shouldn't backport it, again I think users might be depending on the hold behavior, putting it in a patch release is a bit iffy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68408407 Should we at least leave a documentation note to inform users about the difference in behavior? I'm worried that someone will look at the 1.2 docs, write some code which relies on the correct behavior, then be surprised if they run it on an older release. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68410782 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24908/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68410774 [Test build #24908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24908/consoleFull) for PR 3849 at commit [`9d8d4d1`](https://github.com/apache/spark/commit/9d8d4d115449f48ce1714497e16704d4f12442d8). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Sort(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68412414 Hmm, looks like this failed MiMa checks due to the addition of a new method to a public interface: ``` [info] spark-core: found 1 potential binary incompatibilities (filtered 185) [error] * abstract method taskId()Long in class org.apache.spark.TaskContext does not have a correspondent in old version [error]filter with: ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.TaskContext.taskId) ``` I'll add an exclusion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3849#discussion_r22372823 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -180,7 +189,7 @@ private[spark] class Executor( // Run the actual task and measure its runtime. taskStart = System.currentTimeMillis() -val value = task.run(taskId.toInt) +val value = task.run(taskId = taskId.toLong, attemptId = attemptId.toLong) --- End diff -- It turns out that `attemptId` was a long but we only expected it to hold values that could fit in an int, hence this weird `.toLong` in a few places, plus a few `% Int.MaxValue` modulus calls in various places; I've cleaned this up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68418485 [Test build #24923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24923/consoleFull) for PR 3849 at commit [`cbe4d76`](https://github.com/apache/spark/commit/cbe4d76a7968c467514c290fc56ece713225970a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68421615 [Test build #24923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24923/consoleFull) for PR 3849 at commit [`cbe4d76`](https://github.com/apache/spark/commit/cbe4d76a7968c467514c290fc56ece713225970a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68421619 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24923/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3849#discussion_r22375124 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala --- @@ -63,6 +63,33 @@ class TaskContextSuite extends FunSuite with BeforeAndAfter with LocalSparkConte verify(listener, times(1)).onTaskCompletion(any()) } + + test(TaskContext.attemptNumber should return attempt number, not task id (SPARK-4014)) { +sc = new SparkContext(local-cluster[2,1,512], test) --- End diff -- Just realized that I can change the `maxFailures` property on `local` instead of having to use `local-cluster`. Let me make that change, since it's a better practice and will speed up the tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68426611 [Test build #24945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24945/consoleFull) for PR 3849 at commit [`8c387ce`](https://github.com/apache/spark/commit/8c387ce76850caf2163dd82dc5c79b079788a921). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org