[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/6901#discussion_r32868371 --- Diff: docs/programming-guide.md --- @@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does not fit in memory Spark will s to disk, incurring the additional overhead of disk I/O and increased garbage collection. Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files --- End diff -- I agree it could be removed too, even if it probably doesn't matter at this point since we are well beyond 1.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/6901#discussion_r32867731 --- Diff: docs/programming-guide.md --- @@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does not fit in memory Spark will s to disk, incurring the additional overhead of disk I/O and increased garbage collection. Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files --- End diff -- Oh! I thought you meant it as the latter ... "as of the latest version". This is a little confusing. :/ May be it makes sense to remove it completely. The GC based behavior is present for 4 versions now, since Spark 1.0, and its not gonna change in foreseeable future. So its best to remove it. The only things that may change in Spark 1.5 that we induce GC periodically ourselves. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/6901#discussion_r32857157 --- Diff: docs/programming-guide.md --- @@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does not fit in memory Spark will s to disk, incurring the additional overhead of disk I/O and increased garbage collection. Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files --- End diff -- In this case I think the sense was '... in 1.3 and not before', so it can stay as is. Yes, in cases where the meaning is '... as of the latest version, which is currently 1.3, and maybe beyond' then it makes sense to introduce a replacement, or just remove the text altogether. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/6901#discussion_r32855391 --- Diff: docs/programming-guide.md --- @@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does not fit in memory Spark will s to disk, incurring the additional overhead of disk I/O and increased garbage collection. Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files --- End diff -- I know this has been merged, but a annoying issue that I have found in docs (including mine, so I am guilty too) is use of this `as of Spark X`. No one remembers searching for this pattern and it never gets updated. Rather we should use markdown variables, `as of Spark {{site.SPARK_VERSION_SHORT}}`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6901 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/6901#issuecomment-113592294 LGTM. Eventually we want to address this behavior by forcing a periodic GC (once every 30 minutes or something should be inexpensive). For now this is a better description to have. Merging into master 1.4. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6901#issuecomment-113491882 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6901#issuecomment-113491777 [Test build #35260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35260/console) for PR 6901 at commit [`a9faef0`](https://github.com/apache/spark/commit/a9faef078cbcf09bd741feac143bf25fe6dc6e7f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6901#issuecomment-113461078 [Test build #35260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35260/consoleFull) for PR 6901 at commit [`a9faef0`](https://github.com/apache/spark/commit/a9faef078cbcf09bd741feac143bf25fe6dc6e7f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6901#issuecomment-113460929 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6901#issuecomment-113460946 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/6901 [SPARK-5836] [DOCS] [STREAMING] Clarify what may cause long-running Spark apps to preserve shuffle files Clarify what may cause long-running Spark apps to preserve shuffle files You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-5836 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6901.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6901 commit a9faef078cbcf09bd741feac143bf25fe6dc6e7f Author: Sean Owen Date: 2015-06-19T10:15:03Z Clarify what may cause long-running Spark apps to preserve shuffle files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org