[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

tdas Fri, 19 Jun 2015 13:46:10 -0700

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6901#discussion_r32867731
  
    --- Diff: docs/programming-guide.md ---
    @@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does 
not fit in memory Spark will s
     to disk, incurring the additional overhead of disk I/O and increased 
garbage collection.
     
     Shuffle also generates a large number of intermediate files on disk. As of 
Spark 1.3, these files
    --- End diff --
    
    Oh! I thought you meant it as the latter ... "as of the latest version". 
This is a little confusing. :/
    May be it makes sense to remove it completely. The GC based behavior is 
present for 4 versions now, since Spark 1.0, and its not gonna change in 
foreseeable future. So its best to remove it. The only things that may change 
in Spark 1.5 that we induce GC periodically ourselves.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

Reply via email to