[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4022


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4022#issuecomment-70309350
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25671/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4022#issuecomment-70309342
  
  [Test build #25671 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25671/consoleFull)
 for   PR 4022 at commit 
[`587def5`](https://github.com/apache/spark/commit/587def543648c908e144027adb859f651cc9b574).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4022#issuecomment-70299268
  
  [Test build #25671 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25671/consoleFull)
 for   PR 4022 at commit 
[`587def5`](https://github.com/apache/spark/commit/587def543648c908e144027adb859f651cc9b574).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-16 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4022#issuecomment-70299057
  
Okay new version LGTM! Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4022#issuecomment-70298700
  
  [Test build #25668 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25668/consoleFull)
 for   PR 4022 at commit 
[`587def5`](https://github.com/apache/spark/commit/587def543648c908e144027adb859f651cc9b574).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4022#issuecomment-70298713
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25668/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4022#issuecomment-70290693
  
  [Test build #25668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25668/consoleFull)
 for   PR 4022 at commit 
[`587def5`](https://github.com/apache/spark/commit/587def543648c908e144027adb859f651cc9b574).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-16 Thread ilganeli
Github user ilganeli commented on a diff in the pull request:

https://github.com/apache/spark/pull/4022#discussion_r23095446
  
--- Diff: docs/programming-guide.md ---
@@ -1316,7 +1316,35 @@ For accumulator updates performed inside actions 
only, Spark guarantees t
 will only be applied once, i.e. restarted tasks will not update the value. 
In transformations, users should be aware 
 of that each task's update may be applied more than once if tasks or job 
stages are re-executed.
 
+In addition, accumulators do not maintain lineage for the operations that 
use them. Consequently, accumulator updates are not guaranteed to be executed 
when made within a lazy transformation like `map()`. Unless something has 
triggered the evaluation of the lazy transformation that updates the value of 
the accumlator, subsequent operations will not themselves trigger that 
evaluation and the value of the accumulator will remain unchanged. The below 
code fragment demonstrates this issue:
--- End diff --

Thanks for the suggestion - I've updated the doc.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-16 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4022#discussion_r23072648
  
--- Diff: docs/programming-guide.md ---
@@ -1316,7 +1316,35 @@ For accumulator updates performed inside actions 
only, Spark guarantees t
 will only be applied once, i.e. restarted tasks will not update the value. 
In transformations, users should be aware 
 of that each task's update may be applied more than once if tasks or job 
stages are re-executed.
 
+In addition, accumulators do not maintain lineage for the operations that 
use them. Consequently, accumulator updates are not guaranteed to be executed 
when made within a lazy transformation like `map()`. Unless something has 
triggered the evaluation of the lazy transformation that updates the value of 
the accumlator, subsequent operations will not themselves trigger that 
evaluation and the value of the accumulator will remain unchanged. The below 
code fragment demonstrates this issue:
--- End diff --

I found this is worded a bit confusingly: what would it mean for an 
accumulator to "maintain lineage"? I think this is from @JoshRosen's PR 
description, but IMO it might be better to remove that particular phrasing. 
What about a slight re-wording:

```
Accumulators do not change the lazy evaluation model of Spark. Their value 
is only updated once the RDD in which they are being modified is computed as 
part of an action. The below code fragment demonstrates this property:
```

I also didn't call it an "issue" because it's just a property of how they 
work, I don't think it's necessarily a bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-14 Thread squito
Github user squito commented on the pull request:

https://github.com/apache/spark/pull/4022#issuecomment-70036623
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4022#issuecomment-69798871
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25474/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4022#issuecomment-69798858
  
  [Test build #25474 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25474/consoleFull)
 for   PR 4022 at commit 
[`df3afd7`](https://github.com/apache/spark/commit/df3afd7895b97c5280bf28d8c24e543d60775834).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val classServer   = new 
HttpServer(conf, outputDir, new SecurityManager(conf), classServerPort, "HTTP 
class server")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4022#issuecomment-69788388
  
  [Test build #25474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25474/consoleFull)
 for   PR 4022 at commit 
[`df3afd7`](https://github.com/apache/spark/commit/df3afd7895b97c5280bf28d8c24e543d60775834).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-733] Add documentation on use of accumu...

2015-01-13 Thread ilganeli
GitHub user ilganeli opened a pull request:

https://github.com/apache/spark/pull/4022

[SPARK-733] Add documentation on use of accumulators in lazy transformation

I've added documentation clarifying the particular lack of clarity 
highlighted in the relevant JIRA. I've also added code examples for this issue 
to clarify the explanation. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilganeli/spark SPARK-733

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4022.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4022


commit c64da4fabbd805bf289780bbd3fe0f6aec435957
Author: Ilya Ganelin 
Date:   2015-01-07T21:05:30Z

Partially updated task metrics to make some vars private

commit 5525c2011d96929d6143e24cf1abdd9823a9fe26
Author: Ilya Ganelin 
Date:   2015-01-07T21:25:40Z

Completed refactoring to make vars in TaskMetrics class private

commit 1fd59b27a91d645b15e7c701cedb8bc56fbd987a
Author: Ilya Ganelin 
Date:   2015-01-08T17:08:10Z

Updated documentation for accumulators to highlight lazy evaluation issue

commit 33b5a2d4f0ea354d546fdefb0eaed3330709f3c2
Author: Ilya Ganelin 
Date:   2015-01-08T17:14:32Z

Added code examples for java and python

commit 3a38db1ea96f0b1045f78fb5aa093be302c1ab42
Author: Ilya Ganelin 
Date:   2015-01-09T18:30:55Z

Verified documentation update by building via jekyll

commit 4dc2cdbb9806d01df0f3192469b6e129f4d2de29
Author: Ilya Ganelin 
Date:   2015-01-09T18:31:15Z

Merge remote-tracking branch 'upstream/master' into SPARK-733

commit 58034fb6b867db97bbcd74dea6c9678c3eea2948
Author: Ilya Ganelin 
Date:   2015-01-13T17:45:04Z

Merge remote-tracking branch 'upstream/master' into SPARK-733

commit 3f6c5127b094519be0fe3d7ec99a8654d3a58728
Author: Ilya Ganelin 
Date:   2015-01-13T17:51:03Z

Revert "Completed refactoring to make vars in TaskMetrics class private"

This reverts commit 5525c2011d96929d6143e24cf1abdd9823a9fe26.

commit df3afd7895b97c5280bf28d8c24e543d60775834
Author: Ilya Ganelin 
Date:   2015-01-13T17:51:18Z

Revert "Partially updated task metrics to make some vars private"

This reverts commit c64da4fabbd805bf289780bbd3fe0f6aec435957.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org