[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

maropu Mon, 04 Jul 2016 19:16:48 -0700

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/14039
  
    @markhamstra Thanks for the comment. I think the reuse of fragments highly 
depends on user's queries, catalyst optimizer, cluster resources... Reusing 
`ShuffledRowRDD` shuffle data in a single job is a good idea though, it seems 
difficult to stay the data in multiple jobs because spark cannot know when the 
data should be garbaged-collected and it possibly eats much disk space. I think 
caching mechanism is a better idea to reuse fragments in multiple jobs. Or,  do 
u have any smart/concrete idea to reuse the shuffle data?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

Reply via email to