[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6901#discussion_r32868371
  
--- Diff: docs/programming-guide.md ---
@@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does 
not fit in memory Spark will s
 to disk, incurring the additional overhead of disk I/O and increased 
garbage collection.
 
 Shuffle also generates a large number of intermediate files on disk. As of 
Spark 1.3, these files
--- End diff --

I agree it could be removed too, even if it probably doesn't matter at this 
point since we are well beyond 1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/6901#discussion_r32867731
  
--- Diff: docs/programming-guide.md ---
@@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does 
not fit in memory Spark will s
 to disk, incurring the additional overhead of disk I/O and increased 
garbage collection.
 
 Shuffle also generates a large number of intermediate files on disk. As of 
Spark 1.3, these files
--- End diff --

Oh! I thought you meant it as the latter ... "as of the latest version". 
This is a little confusing. :/
May be it makes sense to remove it completely. The GC based behavior is 
present for 4 versions now, since Spark 1.0, and its not gonna change in 
foreseeable future. So its best to remove it. The only things that may change 
in Spark 1.5 that we induce GC periodically ourselves. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6901#discussion_r32857157
  
--- Diff: docs/programming-guide.md ---
@@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does 
not fit in memory Spark will s
 to disk, incurring the additional overhead of disk I/O and increased 
garbage collection.
 
 Shuffle also generates a large number of intermediate files on disk. As of 
Spark 1.3, these files
--- End diff --

In this case I think the sense was '... in 1.3 and not before', so it can 
stay as is. Yes, in cases where the meaning is '... as of the latest version, 
which is currently 1.3, and maybe beyond' then it makes sense to introduce a 
replacement, or just remove the text altogether.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/6901#discussion_r32855391
  
--- Diff: docs/programming-guide.md ---
@@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does 
not fit in memory Spark will s
 to disk, incurring the additional overhead of disk I/O and increased 
garbage collection.
 
 Shuffle also generates a large number of intermediate files on disk. As of 
Spark 1.3, these files
--- End diff --

I know this has been merged, but a annoying issue that I have found in docs 
(including mine, so I am guilty too) is use of this `as of Spark X`. No one 
remembers searching for this pattern and it never gets updated. Rather we 
should use markdown variables, `as of Spark {{site.SPARK_VERSION_SHORT}}`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/6901


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/6901#issuecomment-113592294
  
LGTM. Eventually we want to address this behavior by forcing a periodic GC 
(once every 30 minutes or something should be inexpensive). For now this is a 
better description to have. Merging into master 1.4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6901#issuecomment-113491882
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6901#issuecomment-113491777
  
  [Test build #35260 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35260/console)
 for   PR 6901 at commit 
[`a9faef0`](https://github.com/apache/spark/commit/a9faef078cbcf09bd741feac143bf25fe6dc6e7f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6901#issuecomment-113461078
  
  [Test build #35260 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35260/consoleFull)
 for   PR 6901 at commit 
[`a9faef0`](https://github.com/apache/spark/commit/a9faef078cbcf09bd741feac143bf25fe6dc6e7f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6901#issuecomment-113460929
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6901#issuecomment-113460946
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

2015-06-19 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/6901

[SPARK-5836] [DOCS] [STREAMING] Clarify what may cause long-running Spark 
apps to preserve shuffle files

Clarify what may cause long-running Spark apps to preserve shuffle files

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-5836

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6901.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6901


commit a9faef078cbcf09bd741feac143bf25fe6dc6e7f
Author: Sean Owen 
Date:   2015-06-19T10:15:03Z

Clarify what may cause long-running Spark apps to preserve shuffle files




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org