[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4022 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4022#issuecomment-70309350 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25671/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4022#issuecomment-70309342 [Test build #25671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25671/consoleFull) for PR 4022 at commit [`587def5`](https://github.com/apache/spark/commit/587def543648c908e144027adb859f651cc9b574). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4022#issuecomment-70299268 [Test build #25671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25671/consoleFull) for PR 4022 at commit [`587def5`](https://github.com/apache/spark/commit/587def543648c908e144027adb859f651cc9b574). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4022#issuecomment-70299057 Okay new version LGTM! Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4022#issuecomment-70298700 [Test build #25668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25668/consoleFull) for PR 4022 at commit [`587def5`](https://github.com/apache/spark/commit/587def543648c908e144027adb859f651cc9b574). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4022#issuecomment-70298713 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25668/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4022#issuecomment-70290693 [Test build #25668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25668/consoleFull) for PR 4022 at commit [`587def5`](https://github.com/apache/spark/commit/587def543648c908e144027adb859f651cc9b574). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user ilganeli commented on a diff in the pull request: https://github.com/apache/spark/pull/4022#discussion_r23095446 --- Diff: docs/programming-guide.md --- @@ -1316,7 +1316,35 @@ For accumulator updates performed inside actions only, Spark guarantees t will only be applied once, i.e. restarted tasks will not update the value. In transformations, users should be aware of that each task's update may be applied more than once if tasks or job stages are re-executed. +In addition, accumulators do not maintain lineage for the operations that use them. Consequently, accumulator updates are not guaranteed to be executed when made within a lazy transformation like `map()`. Unless something has triggered the evaluation of the lazy transformation that updates the value of the accumlator, subsequent operations will not themselves trigger that evaluation and the value of the accumulator will remain unchanged. The below code fragment demonstrates this issue: --- End diff -- Thanks for the suggestion - I've updated the doc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4022#discussion_r23072648 --- Diff: docs/programming-guide.md --- @@ -1316,7 +1316,35 @@ For accumulator updates performed inside actions only, Spark guarantees t will only be applied once, i.e. restarted tasks will not update the value. In transformations, users should be aware of that each task's update may be applied more than once if tasks or job stages are re-executed. +In addition, accumulators do not maintain lineage for the operations that use them. Consequently, accumulator updates are not guaranteed to be executed when made within a lazy transformation like `map()`. Unless something has triggered the evaluation of the lazy transformation that updates the value of the accumlator, subsequent operations will not themselves trigger that evaluation and the value of the accumulator will remain unchanged. The below code fragment demonstrates this issue: --- End diff -- I found this is worded a bit confusingly: what would it mean for an accumulator to "maintain lineage"? I think this is from @JoshRosen's PR description, but IMO it might be better to remove that particular phrasing. What about a slight re-wording: ``` Accumulators do not change the lazy evaluation model of Spark. Their value is only updated once the RDD in which they are being modified is computed as part of an action. The below code fragment demonstrates this property: ``` I also didn't call it an "issue" because it's just a property of how they work, I don't think it's necessarily a bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/4022#issuecomment-70036623 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4022#issuecomment-69798871 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25474/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4022#issuecomment-69798858 [Test build #25474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25474/consoleFull) for PR 4022 at commit [`df3afd7`](https://github.com/apache/spark/commit/df3afd7895b97c5280bf28d8c24e543d60775834). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val classServer = new HttpServer(conf, outputDir, new SecurityManager(conf), classServerPort, "HTTP class server")` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4022#issuecomment-69788388 [Test build #25474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25474/consoleFull) for PR 4022 at commit [`df3afd7`](https://github.com/apache/spark/commit/df3afd7895b97c5280bf28d8c24e543d60775834). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...
GitHub user ilganeli opened a pull request: https://github.com/apache/spark/pull/4022 [SPARK-733] Add documentation on use of accumulators in lazy transformation I've added documentation clarifying the particular lack of clarity highlighted in the relevant JIRA. I've also added code examples for this issue to clarify the explanation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilganeli/spark SPARK-733 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4022.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4022 commit c64da4fabbd805bf289780bbd3fe0f6aec435957 Author: Ilya Ganelin Date: 2015-01-07T21:05:30Z Partially updated task metrics to make some vars private commit 5525c2011d96929d6143e24cf1abdd9823a9fe26 Author: Ilya Ganelin Date: 2015-01-07T21:25:40Z Completed refactoring to make vars in TaskMetrics class private commit 1fd59b27a91d645b15e7c701cedb8bc56fbd987a Author: Ilya Ganelin Date: 2015-01-08T17:08:10Z Updated documentation for accumulators to highlight lazy evaluation issue commit 33b5a2d4f0ea354d546fdefb0eaed3330709f3c2 Author: Ilya Ganelin Date: 2015-01-08T17:14:32Z Added code examples for java and python commit 3a38db1ea96f0b1045f78fb5aa093be302c1ab42 Author: Ilya Ganelin Date: 2015-01-09T18:30:55Z Verified documentation update by building via jekyll commit 4dc2cdbb9806d01df0f3192469b6e129f4d2de29 Author: Ilya Ganelin Date: 2015-01-09T18:31:15Z Merge remote-tracking branch 'upstream/master' into SPARK-733 commit 58034fb6b867db97bbcd74dea6c9678c3eea2948 Author: Ilya Ganelin Date: 2015-01-13T17:45:04Z Merge remote-tracking branch 'upstream/master' into SPARK-733 commit 3f6c5127b094519be0fe3d7ec99a8654d3a58728 Author: Ilya Ganelin Date: 2015-01-13T17:51:03Z Revert "Completed refactoring to make vars in TaskMetrics class private" This reverts commit 5525c2011d96929d6143e24cf1abdd9823a9fe26. commit df3afd7895b97c5280bf28d8c24e543d60775834 Author: Ilya Ganelin Date: 2015-01-13T17:51:18Z Revert "Partially updated task metrics to make some vars private" This reverts commit c64da4fabbd805bf289780bbd3fe0f6aec435957. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org